GCP Professional Data Engineer Practice Question

Your team is designing a streaming Dataflow pipeline that ingests JSON events from Pub/Sub, enriches them, and writes the results to BigQuery. Every event must contain a non-empty "userId" field and a numeric "purchaseAmount" greater than zero. Records that fail either rule must be excluded from the BigQuery sink and instead sent to a separate Pub/Sub topic for later analysis. The team wants the simplest approach that keeps the validation logic inside the same pipeline with only one pass over the data. Which Beam pattern best satisfies these requirements?

Enable ignoreUnknownValues on BigQueryIO so that rows violating the rules are silently dropped during streaming inserts.
Use GroupByKey followed by CoGroupByKey to partition valid and invalid elements, then write each PCollection to its respective sink.
Configure a dead-letter topic on the input Pub/Sub subscription so that schema-violating messages are automatically rerouted without Dataflow code changes.
Add a ParDo that applies the validation rules and uses TupleTag side outputs to send invalid records to a secondary Pub/Sub sink while forwarding valid records to BigQuery.

GCP Professional Data Engineer

Ingesting and processing the data

Your Score:

Bash, the Crucial Exams Chat Bot

AI Bot

GCP Professional Data Engineer Practice Question

Answer Description

Ask Bash

What is a ParDo in Apache Beam?

What are TupleTags and how do they work in Apache Beam?

Why is GroupByKey/CoGroupByKey inefficient for record-level validation?

What is a ParDo in Apache Beam?

What are TupleTags and how are they used in Beam pipelines?

Why is GroupByKey/CoGroupByKey not suitable for this use case?

Monthly

$19.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99

One time purchase of $44.99,
Does not auto-renew.

Annual Pass

$119.99

One time purchase of $119.99,
Does not auto-renew.

Lifetime Pass

$189.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

GCP Professional Data Engineer Practice Question

Report Issue

Answer Description

Ask Bash

What is a ParDo in Apache Beam?

What are TupleTags and how do they work in Apache Beam?

Why is GroupByKey/CoGroupByKey inefficient for record-level validation?

What is a ParDo in Apache Beam?

What are TupleTags and how are they used in Beam pipelines?

Why is GroupByKey/CoGroupByKey not suitable for this use case?

Report Issue