GCP Professional Data Engineer Practice Question

Your organization needs to lift-and-shift 7,000 existing batch and streaming PySpark jobs that currently run on an on-premises Apache Hadoop cluster. Leadership insists that the cloud solution remain based on open-source processing engines to minimize vendor lock-in and allow future migration to other clouds. The data engineering team also wants the ability to create short-lived clusters that automatically delete themselves once each job finishes to keep costs low. Which Google Cloud service best fulfills these requirements while requiring little or no refactoring of the existing PySpark code?

  • Migrate the transformations into BigQuery user-defined functions (UDFs) and schedule them as BigQuery jobs.

  • Replicate the logic in Cloud Data Fusion visual pipelines, which are executed on managed Dataproc clusters.

  • Spin up on-demand Cloud Dataproc clusters for each job and enable cluster auto-deletion once the Spark or Hadoop job finishes.

  • Rewrite the PySpark workloads as Apache Beam pipelines and execute them with Cloud Dataflow Flex Templates.

GCP Professional Data Engineer
Designing data processing systems
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot