GCP Professional Data Engineer Practice Question

Your company uses Cloud Composer to orchestrate a nightly PySpark ETL job that must run on an ephemeral Dataproc cluster in the same region as the data. The pipeline must always delete the cluster, even when the Spark job crashes, because another workflow creates a cluster with the identical name the next night. Which DAG design best satisfies this requirement while following Google-recommended patterns?

Wrap the Spark code in DataflowRunner and launch it with DataflowOperator, because Dataflow automatically tears down workers on failure.
Pass an "auto_delete": true flag to DataprocSubmitJobOperator so Dataproc deletes the cluster after the job finishes.
Attach an on_failure_callback to DataprocSubmitJobOperator that runs a gcloud dataproc clusters delete command when the task fails.
Chain DataprocCreateClusterOperator ➔ DataprocSubmitJobOperator ➔ DataprocDeleteClusterOperator and set trigger_rule="all_done" on the delete task.

GCP Professional Data Engineer

Maintaining and automating data workloads

Your Score:

Bash, the Crucial Exams Chat Bot

AI Bot

GCP Professional Data Engineer Practice Question

Answer Description

Ask Bash

What is Cloud Composer used for in GCP?

Why is trigger_rule='all_done' necessary for the cleanup step?

What is an ephemeral Dataproc cluster?

Monthly

$19.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99

One time purchase of $44.99,
Does not auto-renew.

Annual Pass

$119.99

One time purchase of $119.99,
Does not auto-renew.

Lifetime Pass

$189.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

GCP Professional Data Engineer Practice Question

Report Issue

Answer Description

Ask Bash

What is Cloud Composer used for in GCP?

Why is trigger_rule='all_done' necessary for the cleanup step?

What is an ephemeral Dataproc cluster?

Report Issue