GCP Professional Data Engineer Practice Question

Your company runs hundreds of nightly ETL workloads implemented as Apache Spark jobs on an on-premises Hadoop cluster. Management wants to migrate these pipelines to Google Cloud, but the CTO insists the Spark code remain unchanged so it can later run on Amazon EMR. The data engineering team also wants to avoid managing clusters or manually patching software in Google Cloud. Which approach best meets both the portability and operational requirements?

Load the source data into BigQuery and replace the Spark transformations with SQL and dbt models orchestrated by Cloud Composer.
Rewrite the pipelines in Apache Beam and execute them on Cloud Dataflow so they can later run on any Beam-compatible runner.
Deploy the existing Spark jobs on on-demand Cloud Dataproc clusters, which manage the underlying Hadoop and Spark runtime automatically.
Package the Spark jobs into containers and orchestrate them on Google Kubernetes Engine using the open-source Spark Operator.

GCP Professional Data Engineer

Designing data processing systems

Your Score:

Bash, the Crucial Exams Chat Bot

AI Bot

GCP Professional Data Engineer Practice Question

Answer Description

Ask Bash

What is Cloud Dataproc and why is it suitable for Spark jobs?

What is the benefit of using Apache Spark on Cloud Dataproc compared to Kubernetes with Spark Operator?

Why is rewriting Apache Spark jobs in Apache Beam for Cloud Dataflow not suitable here?

Monthly

$19.99 $11.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99 $26.99

One time purchase of $26.99,
Does not auto-renew.

Annual Pass

$119.99 $71.99

One time purchase of $71.99,
Does not auto-renew.

Lifetime Pass

$189.99 $113.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

GCP Professional Data Engineer Practice Question

Report Issue

Answer Description

Ask Bash

What is Cloud Dataproc and why is it suitable for Spark jobs?

What is the benefit of using Apache Spark on Cloud Dataproc compared to Kubernetes with Spark Operator?

Why is rewriting Apache Spark jobs in Apache Beam for Cloud Dataflow not suitable here?

Report Issue