GCP Professional Data Engineer Practice Question

Your company ingests real-time purchase events through a Cloud Pub/Sub → Dataflow → BigQuery pipeline. The Dataflow job currently acknowledges each message as soon as it is read and only logs JSON parsing errors. Recently, an upstream bug produced malformed JSON for several hours; the pipeline acknowledged these messages, so they were neither processed nor recoverable. You must redesign ingestion so malformed events are retained for later inspection and replay without increasing latency for valid events. Which approach best meets these needs?

  • Write incoming events to a staging table with BigQuery MERGE and schedule daily table snapshots to Cloud Storage so you can roll back if corruption occurs.

  • Enable Pub/Sub exactly-once delivery and rely on BigQuery time-travel to restore any rows that might be missing from the production table.

  • Insert a Cloud Function publisher proxy that validates JSON and drops any message that fails validation before it reaches Pub/Sub.

  • Configure the subscription with a dead-letter topic and modify the Dataflow pipeline to acknowledge only successfully parsed messages; unacknowledged messages are eventually routed to the dead-letter topic for later reprocessing.

GCP Professional Data Engineer
Maintaining and automating data workloads
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot