Your on-premise 50-node Hadoop cluster runs nightly Spark and Hive batch jobs scheduled by Oozie. The code depends on standard Hadoop libraries and HDFS paths. You must move the workload to Google Cloud in two weeks with minimal refactoring, keep triggering jobs from your existing CI pipeline, and ensure clusters auto-scale and delete themselves when work finishes to control cost. Which Google Cloud service is the most appropriate execution target?
Run the pipelines on Cloud Dataflow by rewriting the Spark and Hive code in Apache Beam.
Rewrite all Spark and Hive transformations as BigQuery SQL and schedule them with BigQuery Data Transfer Service.
Create ephemeral Cloud Dataproc clusters backed by Cloud Storage and submit the existing Spark and Hive jobs without modification.
Package each Spark job into a container image and execute it on Cloud Run with automatic scaling.
Cloud Dataproc provides managed Hadoop and Spark clusters that can read and write directly to Cloud Storage through the built-in Hadoop Connector, so existing Spark and Hive code that expects HDFS paths usually runs unchanged. Clusters start in about 90 seconds, support autoscaling, and can be configured to delete themselves when jobs complete, lowering operational overhead and cost. Moving the workload to Dataflow would require rewriting the jobs in Apache Beam. BigQuery SQL or Dataform would need a full translation of the Spark and Hive logic. Cloud Run cannot natively execute distributed Hadoop or Spark frameworks. Therefore, using ephemeral Cloud Dataproc clusters is the best choice for a rapid lift-and-shift migration with minimal code changes and automated cluster lifecycle management.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Cloud Dataproc, and why is it ideal for lift-and-shift migrations?
Open an interactive chat with Bash
How does Cloud Dataproc integrate with Cloud Storage for running HDFS-dependent jobs?
Open an interactive chat with Bash
What are some advantages of ephemeral clusters in Cloud Dataproc?
Open an interactive chat with Bash
What is Cloud Dataproc in Google Cloud?
Open an interactive chat with Bash
Why is Cloud Storage used with Cloud Dataproc instead of HDFS?
Open an interactive chat with Bash
How does Cloud Dataproc enable autoscaling for clusters?
Open an interactive chat with Bash
GCP Professional Data Engineer
Ingesting and processing the data
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .