GCP Professional Data Engineer Practice Question

Your team runs a Spark Structured Streaming pipeline that reads events from Pub/Sub, enriches them with look-ups stored on Cloud Storage, and writes low-latency aggregates to BigTable for a near real-time dashboard. The job must run 24×7, allow on-the-fly code updates for tuning, and survive individual worker failures without interrupting the stream. You also need the flexibility to scale the number of workers up or down automatically as traffic fluctuates. Which Dataproc deployment model best meets these requirements while controlling unnecessary idle costs?

Submit the job to a transient (ephemeral) Dataproc cluster created by a workflow template and deleted after each micro-batch completes.
Use Dataproc Serverless for Spark to submit the streaming job as a batch task that spins up resources on demand and terminates when the driver exits.
Run the job on a persistent Dataproc cluster configured with an autoscaling policy.
Create a Cloud Composer DAG that launches a new Dataproc cluster every hour to run the streaming job and tears it down when the hour ends.

GCP Professional Data Engineer

Maintaining and automating data workloads

Your Score:

Bash, the Crucial Exams Chat Bot

AI Bot

GCP Professional Data Engineer Practice Question

Answer Description

Ask Bash

What is Dataproc autoscaling?

How does Pub/Sub integrate with Spark Structured Streaming?

Why is BigTable suitable for low-latency aggregations in this pipeline?

What is a persistent Dataproc cluster in GCP?

How does autoscaling work in Dataproc?

Why is a transient Dataproc cluster not suitable for 24×7 streaming jobs?

Monthly

$19.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99

One time purchase of $44.99,
Does not auto-renew.

Annual Pass

$119.99

One time purchase of $119.99,
Does not auto-renew.

Lifetime Pass

$189.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

GCP Professional Data Engineer Practice Question

Report Issue

Answer Description

Ask Bash

What is Dataproc autoscaling?

How does Pub/Sub integrate with Spark Structured Streaming?

Why is BigTable suitable for low-latency aggregations in this pipeline?

What is a persistent Dataproc cluster in GCP?

How does autoscaling work in Dataproc?

Why is a transient Dataproc cluster not suitable for 24×7 streaming jobs?

Report Issue