GCP Professional Data Engineer Practice Question

Your company uses Cloud Composer to orchestrate a nightly PySpark ETL job that must run on an ephemeral Dataproc cluster in the same region as the data. The pipeline must always delete the cluster, even when the Spark job crashes, because another workflow creates a cluster with the identical name the next night. Which DAG design best satisfies this requirement while following Google-recommended patterns?

  • Wrap the Spark code in DataflowRunner and launch it with DataflowOperator, because Dataflow automatically tears down workers on failure.

  • Pass an "auto_delete": true flag to DataprocSubmitJobOperator so Dataproc deletes the cluster after the job finishes.

  • Attach an on_failure_callback to DataprocSubmitJobOperator that runs a gcloud dataproc clusters delete command when the task fails.

  • Chain DataprocCreateClusterOperator âž” DataprocSubmitJobOperator âž” DataprocDeleteClusterOperator and set trigger_rule="all_done" on the delete task.

GCP Professional Data Engineer
Maintaining and automating data workloads
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot