GCP Professional Data Engineer Practice Question

Your media analytics team runs 50 independent Hive batch transformations every night on a 20-node single-master Dataproc cluster. Each job completes in about 40 minutes, after which the cluster sits idle until the next evening. Finance has asked you to reduce the cluster's compute cost by at least 60 percent, but you must continue using the existing Hive scripts and need the flexibility to set custom initialization actions for individual jobs. What should you do?

  • Downsize the persistent cluster to five workers and add preemptible local SSDs to accelerate the nightly ETL without changing the job structure.

  • Keep the existing cluster but attach an autoscaling policy that reduces primary workers to zero during idle periods to avoid VM charges.

  • Rewrite the Hive transformations for BigQuery and schedule them as low-priority batch queries to take advantage of on-demand pricing.

  • Package each nightly Hive job in a Dataproc workflow template that launches a job-scoped cluster with the required initialization actions, executes the job against data in Cloud Storage, and automatically deletes the cluster when the job finishes.

GCP Professional Data Engineer
Maintaining and automating data workloads
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot