GCP Professional Data Engineer Practice Question

Your team operates a Dataproc cluster with three masters and 30 workers to run a two-hour Spark ETL job every night. For the remaining 22 hours the cluster is idle, yet the Spark application requires the same custom image and machine type for every run. After a 40 % budget cut, you must dramatically reduce compute spend while keeping job performance and configuration isolation intact. What should you do?

  • Use a Dataproc workflow template (or Cloud Composer DAG) that creates a job-scoped Dataproc cluster with the required image and machine type, runs the Spark job, and automatically deletes the cluster when the job completes.

  • Rewrite the Spark application for Cloud Dataflow and schedule it nightly with a template launch.

  • Convert the existing cluster to an autoscaling persistent cluster with a minimum of zero workers and only preemptible secondary workers.

  • Enable Dataproc High Availability and schedule the cluster to hibernate during idle hours with Cloud Scheduler scripts.

GCP Professional Data Engineer
Maintaining and automating data workloads
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot