🔥 40% Off Crucial Exams Memberships — Deal ends today!

2 hours, 27 minutes remaining!

GCP Professional Data Engineer Practice Question

Your company runs hundreds of nightly ETL workloads implemented as Apache Spark jobs on an on-premises Hadoop cluster. Management wants to migrate these pipelines to Google Cloud, but the CTO insists the Spark code remain unchanged so it can later run on Amazon EMR. The data engineering team also wants to avoid managing clusters or manually patching software in Google Cloud. Which approach best meets both the portability and operational requirements?

  • Load the source data into BigQuery and replace the Spark transformations with SQL and dbt models orchestrated by Cloud Composer.

  • Rewrite the pipelines in Apache Beam and execute them on Cloud Dataflow so they can later run on any Beam-compatible runner.

  • Deploy the existing Spark jobs on on-demand Cloud Dataproc clusters, which manage the underlying Hadoop and Spark runtime automatically.

  • Package the Spark jobs into containers and orchestrate them on Google Kubernetes Engine using the open-source Spark Operator.

GCP Professional Data Engineer
Designing data processing systems
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot