A retailer is migrating its on-premises Hadoop environment to Google Cloud. The data engineering team will run Spark Structured Streaming jobs that ingest inventory events 24×7 and must keep end-to-end latency under one minute. During the day, analysts connect with Hive to perform interactive, ad-hoc queries against the same datasets. The team needs the flexibility to install custom Hadoop libraries and is comfortable paying for always-on capacity to avoid startup delays. Which Dataproc deployment model best satisfies these requirements?
Maintain a single persistent Dataproc cluster that runs the streaming Spark jobs continuously and allows analysts to submit interactive Hive queries on demand.
Schedule Cloud Composer to launch Dataproc Serverless Spark batches for both streaming ingestion and analyst queries.
Store data in BigQuery and replace Spark streaming with scheduled BigQuery queries, eliminating the need for any Dataproc cluster.
Use a Dataproc workflow template that creates an ephemeral cluster for each Spark streaming job and deletes it when the job finishes.
Persistent Dataproc clusters are designed for long-running, stateful workloads that must remain available, such as continuous Spark or Flink streaming pipelines and interactive SQL engines (Hive, Presto, Spark-SQL). Because the cluster and its YARN or Kubernetes schedulers stay up, streaming jobs maintain low, predictable latency and analysts can connect at any time without the overhead of cluster creation. Ephemeral, job-scoped clusters, serverless Spark batches, or eliminating the cluster entirely favor cost-optimized batch use cases but introduce startup delay, restrict custom library installation, or remove interactive capabilities-none of which meet the stated requirements.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Dataproc and how does it work with Hadoop and Spark?
Open an interactive chat with Bash
Why is a persistent Dataproc cluster suitable for 24×7 streaming jobs?
Open an interactive chat with Bash
What is the difference between persistent and ephemeral clusters in Dataproc?
Open an interactive chat with Bash
What is the advantage of using a persistent Dataproc cluster for Spark Structured Streaming jobs?
Open an interactive chat with Bash
Why are ephemeral clusters or serverless options not suitable for this scenario?
Open an interactive chat with Bash
What are the key benefits of Dataproc for stateful workloads compared to other tools like BigQuery?
Open an interactive chat with Bash
GCP Professional Data Engineer
Maintaining and automating data workloads
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99 $11.99
$11.99/mo
Billed monthly, Cancel any time.
$19.99 after promotion ends
3 Month Pass
$44.99 $26.99
$8.99/mo
One time purchase of $26.99, Does not auto-renew.
$44.99 after promotion ends
Save $18!
MOST POPULAR
Annual Pass
$119.99 $71.99
$5.99/mo
One time purchase of $71.99, Does not auto-renew.
$119.99 after promotion ends
Save $48!
BEST DEAL
Lifetime Pass
$189.99 $113.99
One time purchase, Good for life.
Save $76!
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .