Your company uses Cloud Composer to orchestrate a nightly PySpark ETL job that must run on an ephemeral Dataproc cluster in the same region as the data. The pipeline must always delete the cluster, even when the Spark job crashes, because another workflow creates a cluster with the identical name the next night. Which DAG design best satisfies this requirement while following Google-recommended patterns?
Wrap the Spark code in DataflowRunner and launch it with DataflowOperator, because Dataflow automatically tears down workers on failure.
Pass an "auto_delete": true flag to DataprocSubmitJobOperator so Dataproc deletes the cluster after the job finishes.
Attach an on_failure_callback to DataprocSubmitJobOperator that runs a gcloud dataproc clusters delete command when the task fails.
Chain DataprocCreateClusterOperator âž” DataprocSubmitJobOperator âž” DataprocDeleteClusterOperator and set trigger_rule="all_done" on the delete task.
Creating an ephemeral Dataproc cluster directly inside DataprocSubmitJobOperator is not supported; the operator only submits a job to an existing cluster. The common pattern in Cloud Composer is to create the cluster, run the job, and then delete the cluster in three separate tasks. By setting the DataprocDeleteClusterOperator's trigger_rule to ALL_DONE (or "all_done") you guarantee that the delete task executes regardless of the upstream task's success or failure, ensuring the cluster is always removed before the next run. The other options rely on parameters or callbacks that do not exist, misuse services, or fail to guarantee cleanup if the job task fails.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Cloud Composer used for in GCP?
Open an interactive chat with Bash
Why is trigger_rule='all_done' necessary for the cleanup step?
Open an interactive chat with Bash
What is an ephemeral Dataproc cluster?
Open an interactive chat with Bash
GCP Professional Data Engineer
Maintaining and automating data workloads
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .