Your team operates a Dataproc cluster with three masters and 30 workers to run a two-hour Spark ETL job every night. For the remaining 22 hours the cluster is idle, yet the Spark application requires the same custom image and machine type for every run. After a 40 % budget cut, you must dramatically reduce compute spend while keeping job performance and configuration isolation intact. What should you do?
Use a Dataproc workflow template (or Cloud Composer DAG) that creates a job-scoped Dataproc cluster with the required image and machine type, runs the Spark job, and automatically deletes the cluster when the job completes.
Rewrite the Spark application for Cloud Dataflow and schedule it nightly with a template launch.
Convert the existing cluster to an autoscaling persistent cluster with a minimum of zero workers and only preemptible secondary workers.
Enable Dataproc High Availability and schedule the cluster to hibernate during idle hours with Cloud Scheduler scripts.
Creating an ephemeral (job-scoped) Dataproc cluster for each nightly run lets you specify the exact image, software version, and machine type the job needs, then deletes the entire cluster-including masters-when the job finishes. You pay only for the two hours of runtime, eliminating charges for idle masters and workers. Autoscaling or hibernation would still leave at least the master VMs (and often some workers) running and incurring cost, and migrating to Dataflow would require code changes and does not preserve the custom Dataproc image the Spark job depends on.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a Dataproc workflow template?
Open an interactive chat with Bash
Why would using Cloud Dataflow require code changes?
Open an interactive chat with Bash
What is the difference between an ephemeral Dataproc cluster and a persistent cluster?
Open an interactive chat with Bash
What is a Dataproc workflow template?
Open an interactive chat with Bash
Why is an ephemeral Dataproc cluster cost-effective?
Open an interactive chat with Bash
What is the difference between Dataproc and Cloud Dataflow?
Open an interactive chat with Bash
GCP Professional Data Engineer
Maintaining and automating data workloads
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .