GCP Professional Data Engineer Practice Question

Your analytics team keeps a 40-node Dataproc cluster running 24×7 to execute a 2-hour Spark ETL that starts at 00:00 and to host interactive PySpark notebooks during business hours. Monitoring shows that over 80 % of cluster CPU is idle across a week. Management asks you to cut compute cost without increasing the notebook users' startup latency. What should you do?

  • Attach an autoscaling policy to the current cluster and set its worker count range to 0-40 so the cluster scales down to zero after the ETL finishes.

  • Convert the existing cluster's secondary workers to preemptible VMs and continue running notebooks and the nightly ETL on the same cluster.

  • Keep a small persistent Dataproc cluster for notebooks and launch the nightly ETL on an ephemeral job-scoped cluster that deletes itself when the job completes.

  • Move the ETL to a scheduled BigQuery query and keep the 40-node Dataproc cluster for interactive notebooks.

GCP Professional Data Engineer
Maintaining and automating data workloads
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot