Your analytics team executes four large Spark batch ETL jobs every night, each with different libraries and executor memory requirements. During business hours, data scientists occasionally run short interactive Hive queries that must return within minutes. You want to minimize Dataproc costs without sacrificing performance or isolating the nightly jobs from one another. Which strategy best meets these goals?
Submit each nightly batch job to its own ephemeral Dataproc cluster and delete the cluster on completion; maintain a small persistent cluster for interactive queries.
Run both the batch jobs and interactive queries on a single persistent Dataproc cluster with autoscaling disabled to avoid provisioning delays.
Provision a separate always-on persistent Dataproc cluster for each nightly batch job to guarantee resource isolation, and shut them down in the morning.
Keep an always-on persistent cluster sized for the nightly batch peak, and launch short-lived job-based clusters only for interactive queries.
Creating an ephemeral Dataproc cluster for each nightly batch job allows you to specify the exact machine type, initialization actions, and software packages that job needs, then delete the cluster as soon as the workload finishes-eliminating idle-time costs and guaranteeing job isolation. Keeping a small, always-on persistent cluster exclusively for daytime interactive queries avoids the several-minute cold-start penalty of spinning up a new cluster while limiting the number of VMs that remain running 24×7. Running all workloads on a single persistent cluster, multiple persistent clusters, or a single persistent cluster per job either keeps unnecessary virtual machines running or duplicates fixed costs, so they are less cost-efficient than the hybrid approach.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is an ephemeral Dataproc cluster?
Open an interactive chat with Bash
What is the difference between ephemeral and persistent Dataproc clusters?
Open an interactive chat with Bash
Why use an ephemeral cluster for nightly batch jobs?
Open an interactive chat with Bash
GCP Professional Data Engineer
Maintaining and automating data workloads
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .