Your analytics team keeps a 40-node Dataproc cluster running 24×7 to execute a 2-hour Spark ETL that starts at 00:00 and to host interactive PySpark notebooks during business hours. Monitoring shows that over 80 % of cluster CPU is idle across a week. Management asks you to cut compute cost without increasing the notebook users' startup latency. What should you do?
Keep a small persistent Dataproc cluster for notebooks and launch the nightly ETL on an ephemeral job-scoped cluster that deletes itself when the job completes.
Move the ETL to a scheduled BigQuery query and keep the 40-node Dataproc cluster for interactive notebooks.
Convert the existing cluster's secondary workers to preemptible VMs and continue running notebooks and the nightly ETL on the same cluster.
Attach an autoscaling policy to the current cluster and set its worker count range to 0-40 so the cluster scales down to zero after the ETL finishes.
Creating a small, always-on Dataproc cluster sized only for interactive notebooks preserves the low-latency environment that analysts expect. Submitting the nightly ETL to a job-scoped (ephemeral) Dataproc cluster means all worker and master VMs for that batch workload are created just-in-time and deleted immediately after the job finishes, so no charges accrue while the workload is idle. Autoscaling the existing cluster cannot eliminate master nodes, so some cost remains. Switching to preemptible workers reduces, but does not remove, idle cost and can impact notebook stability. Migrating the ETL to BigQuery leaves the large Dataproc cluster in place, so spend is not minimized.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a job-scoped (ephemeral) Dataproc cluster?
Open an interactive chat with Bash
What are the benefits of preemptible VMs in Dataproc?
Open an interactive chat with Bash
How does BigQuery differ from Dataproc for ETL jobs?
Open an interactive chat with Bash
What is an ephemeral Dataproc cluster?
Open an interactive chat with Bash
How does autoscaling work in a Dataproc cluster?
Open an interactive chat with Bash
What are preemptible VMs in Dataproc?
Open an interactive chat with Bash
GCP Professional Data Engineer
Maintaining and automating data workloads
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .