A retailer runs a 50-node on-prem Hadoop cluster that supports ad-hoc Hive queries for data analysts during the day and Spark ETL jobs overnight. Management wants a fast lift-and-shift to Google Cloud that keeps the existing Hive metastore and avoids rewriting job submission scripts. They accept some idle cost in the short term but want the option to manually scale down workers during quiet periods. Which Dataproc deployment approach best satisfies these requirements while modernization is planned?
Rewrite the Hive queries as BigQuery SQL and migrate all data directly into BigQuery to eliminate cluster management entirely.
Use Dataproc Serverless for Spark, submitting every batch and interactive job through the serverless API instead of managing clusters.
Create a single persistent Dataproc cluster sized similarly to the on-prem environment, store data in Cloud Storage, and manually resize the cluster when workload volumes change.
Refactor each Spark and Hive workload to launch its own ephemeral Dataproc cluster that is created at job start and deleted on completion.
A single long-lived (persistent) Dataproc cluster reproduces the always-on characteristics of the on-prem environment, lets analysts continue running interactive Hive queries, and allows existing Spark batch scripts to run without change. Because the cluster persists, the Hive metastore and YARN resource manager remain available all day-critical for an unmodified lift-and-shift. You can still lower costs later by manually resizing the cluster or enabling autoscaling policies. Ephemeral clusters, Dataproc Serverless, or moving directly to BigQuery would all require changes to job orchestration, SQL dialect, or execution flow, contradicting the goal of an initial, low-risk migration.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Dataproc and how does it work on Google Cloud?
Open an interactive chat with Bash
How does Cloud Storage integrate with Dataproc for big data processing?
Open an interactive chat with Bash
What is the role of the Hive metastore in Hadoop or Dataproc environments?
Open an interactive chat with Bash
What is a Dataproc cluster?
Open an interactive chat with Bash
What is the Hive metastore and why is it important in this scenario?
Open an interactive chat with Bash
What is the benefit of using Cloud Storage in Dataproc deployments?
Open an interactive chat with Bash
GCP Professional Data Engineer
Maintaining and automating data workloads
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .