GCP Professional Data Engineer Practice Question

Your team runs a Spark Structured Streaming pipeline that reads events from Pub/Sub, enriches them with look-ups stored on Cloud Storage, and writes low-latency aggregates to BigTable for a near real-time dashboard. The job must run 24×7, allow on-the-fly code updates for tuning, and survive individual worker failures without interrupting the stream. You also need the flexibility to scale the number of workers up or down automatically as traffic fluctuates. Which Dataproc deployment model best meets these requirements while controlling unnecessary idle costs?

  • Run the job on a persistent Dataproc cluster configured with an autoscaling policy.

  • Create a Cloud Composer DAG that launches a new Dataproc cluster every hour to run the streaming job and tears it down when the hour ends.

  • Use Dataproc Serverless for Spark to submit the streaming job as a batch task that spins up resources on demand and terminates when the driver exits.

  • Submit the job to a transient (ephemeral) Dataproc cluster created by a workflow template and deleted after each micro-batch completes.

GCP Professional Data Engineer
Maintaining and automating data workloads
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot