GCP Professional Data Engineer Practice Question

Your analytics team schedules a 4-hour Spark ETL job every night on a Dataproc cluster with one master node and 20 n2-standard-4 workers. To keep the cluster available for occasional ad-hoc jobs, the team leaves it running the remaining 20 hours each day, resulting in high compute charges for mostly idle resources. Management asks you to reduce costs without extending the nightly ETL completion time or sacrificing ad-hoc flexibility. What should you do?

  • Enable Dataproc autoscaling on the existing cluster so worker nodes scale down during idle periods while keeping the master and two workers running.

  • Configure Cloud Composer to spin up a job-scoped Dataproc cluster each night (and for any ad-hoc submission), run the Spark job, then delete the cluster after completion.

  • Keep the persistent cluster but convert all worker nodes to preemptible VMs to lower the hourly rate.

  • Rewrite the workload as a continuous Dataflow streaming job that runs on a single, permanently provisioned regional Dataflow job.

GCP Professional Data Engineer
Maintaining and automating data workloads
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot