GCP Professional Data Engineer Practice Question

Your team is designing a streaming Dataflow pipeline that ingests JSON events from Pub/Sub, enriches them, and writes the results to BigQuery. Every event must contain a non-empty "userId" field and a numeric "purchaseAmount" greater than zero. Records that fail either rule must be excluded from the BigQuery sink and instead sent to a separate Pub/Sub topic for later analysis. The team wants the simplest approach that keeps the validation logic inside the same pipeline with only one pass over the data. Which Beam pattern best satisfies these requirements?

  • Enable ignoreUnknownValues on BigQueryIO so that rows violating the rules are silently dropped during streaming inserts.

  • Use GroupByKey followed by CoGroupByKey to partition valid and invalid elements, then write each PCollection to its respective sink.

  • Configure a dead-letter topic on the input Pub/Sub subscription so that schema-violating messages are automatically rerouted without Dataflow code changes.

  • Add a ParDo that applies the validation rules and uses TupleTag side outputs to send invalid records to a secondary Pub/Sub sink while forwarding valid records to BigQuery.

GCP Professional Data Engineer
Ingesting and processing the data
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot