🔥 40% Off Crucial Exams Memberships — Deal ends today!

14 minutes, 4 seconds remaining!

GCP Professional Data Engineer Practice Question

A retail analytics division runs nightly Spark ETL pipelines for four business units. All jobs currently share a single persistent Dataproc cluster backed by Cloud Storage. During month-end closes, long-running joins from one unit starve executors needed by others, and the central platform team spends hours tuning YARN queues. Management asks you to eliminate cross-team resource contention, keep costs low when jobs are idle, and avoid complex capacity management. What should you do?

  • Launch an ephemeral Dataproc cluster for each team's nightly job, run the Spark pipeline, and configure the cluster to delete itself when the job succeeds or fails.

  • Enable autoscaling on the persistent cluster and add preemptible secondary workers to handle month-end peaks.

  • Migrate the Spark pipelines to BigQuery using BigQuery-Connector and execute them as scheduled queries with on-demand pricing.

  • Resize the existing persistent cluster to a larger machine class and define stricter YARN capacity scheduler queues for each business unit.

GCP Professional Data Engineer
Maintaining and automating data workloads
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot