Your team is designing a streaming Dataflow pipeline that ingests JSON events from Pub/Sub, enriches them, and writes the results to BigQuery. Every event must contain a non-empty "userId" field and a numeric "purchaseAmount" greater than zero. Records that fail either rule must be excluded from the BigQuery sink and instead sent to a separate Pub/Sub topic for later analysis. The team wants the simplest approach that keeps the validation logic inside the same pipeline with only one pass over the data. Which Beam pattern best satisfies these requirements?
Enable ignoreUnknownValues on BigQueryIO so that rows violating the rules are silently dropped during streaming inserts.
Use GroupByKey followed by CoGroupByKey to partition valid and invalid elements, then write each PCollection to its respective sink.
Configure a dead-letter topic on the input Pub/Sub subscription so that schema-violating messages are automatically rerouted without Dataflow code changes.
Add a ParDo that applies the validation rules and uses TupleTag side outputs to send invalid records to a secondary Pub/Sub sink while forwarding valid records to BigQuery.
Using a ParDo that evaluates each element and emits to multiple outputs identified by TupleTags lets you apply the validation logic in a single transformation. The main output can carry valid elements directly to the BigQuery sink, while the side output containing the rejected elements can be fed to a Pub/Sub sink. This approach avoids additional shuffles, keeps the work in one job, and requires minimal boilerplate.
GroupByKey/CoGroupByKey would perform an expensive shuffle and is unnecessary for record-level validation.
Setting ignoreUnknownValues on BigQueryIO only ignores extra columns; rows failing required-field or range checks still cause errors and are not automatically routed elsewhere.
Pub/Sub dead-letter topics handle messages that cannot be delivered or acknowledged, not content validation performed after the message is read by Dataflow.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a ParDo in Apache Beam?
Open an interactive chat with Bash
What are TupleTags and how do they work in Apache Beam?
Open an interactive chat with Bash
Why is GroupByKey/CoGroupByKey inefficient for record-level validation?
Open an interactive chat with Bash
What is a ParDo in Apache Beam?
Open an interactive chat with Bash
What are TupleTags and how are they used in Beam pipelines?
Open an interactive chat with Bash
Why is GroupByKey/CoGroupByKey not suitable for this use case?
Open an interactive chat with Bash
GCP Professional Data Engineer
Ingesting and processing the data
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .