GCP Professional Data Engineer Practice Question

A media company ingests clickstream data 24×7 from its mobile applications and processes it in near-real-time using Spark Structured Streaming and a custom Flink job. The pipeline must continuously enrich events with user profiles stored in Bigtable and write the results to BigQuery with end-to-end latency under one minute. Operators also need to run on-demand SQL queries against the same Spark metastore during business hours. Which Dataproc deployment model best meets these requirements while balancing cost and operational complexity?

  • Migrate the streaming code to Cloud Dataflow and spin up an on-demand Dataproc cluster only for interactive SQL queries.

  • Submit each Spark and Flink job to a separate ephemeral Dataproc job-cluster that terminates when the job finishes.

  • Use Dataproc Serverless for all streaming and interactive workloads and disable any persistent cluster resources.

  • Create a persistent Dataproc cluster with autoscaling enabled and run both Spark Streaming and Flink jobs continuously.

GCP Professional Data Engineer
Maintaining and automating data workloads
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot