GCP Professional Data Engineer Practice Question

Your on-premise 50-node Hadoop cluster runs nightly Spark and Hive batch jobs scheduled by Oozie. The code depends on standard Hadoop libraries and HDFS paths. You must move the workload to Google Cloud in two weeks with minimal refactoring, keep triggering jobs from your existing CI pipeline, and ensure clusters auto-scale and delete themselves when work finishes to control cost. Which Google Cloud service is the most appropriate execution target?

  • Rewrite all Spark and Hive transformations as BigQuery SQL and schedule them with BigQuery Data Transfer Service.

  • Run the pipelines on Cloud Dataflow by rewriting the Spark and Hive code in Apache Beam.

  • Create ephemeral Cloud Dataproc clusters backed by Cloud Storage and submit the existing Spark and Hive jobs without modification.

  • Package each Spark job into a container image and execute it on Cloud Run with automatic scaling.

GCP Professional Data Engineer
Ingesting and processing the data
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot