GCP Professional Data Engineer Practice Question

An enterprise runs nightly Apache Spark ETL jobs written in Scala on an on-prem Hadoop YARN cluster. They want to lift-and-shift these jobs to Google Cloud with almost no code changes. The solution must provide on-demand autoscaling, let them attach preemptible workers to save costs, keep using a Hive Metastore for shared table definitions, and allow engineers to open an SSH session on worker nodes for live debugging. Which Google Cloud service best meets all of these requirements?

  • Schedule equivalent SQL transformations as BigQuery scheduled queries and stored procedures.

  • Migrate the Spark code to Apache Beam and execute the pipelines on Cloud Dataflow.

  • Create on-demand Dataproc clusters that run the existing Spark jobs, enable autoscaling with preemptible secondary workers, and connect to Dataproc Metastore.

  • Rebuild the ETL logic in Cloud Data Fusion and run the pipelines in batch mode on Cloud Dataflow.

GCP Professional Data Engineer
Ingesting and processing the data
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot