Your team is building a streaming pipeline that ingests sensor events from Pub/Sub, enriches them in Dataflow, and writes the results to a BigQuery table used by near-real-time dashboards. Compliance requires that each record contain a non-null device_id and that the temperature field be within −50 °C to 150 °C. Bad records must be captured for offline review without stopping the pipeline, and data quality violations above 2 % of the stream should raise an on-call alert. Which design best meets these requirements with the least operational overhead?
Add a validation ParDo (or schema-aware Validate transform) in the Dataflow pipeline that routes records failing the device_id and temperature checks to a side output written to a dead-letter BigQuery table, while emitting Counter metrics exported to Cloud Monitoring with an alert set to trigger when invalid records exceed 2 % of throughput.
Write all events directly to BigQuery and schedule Dataform assertions every 10 minutes to query for null device_id or out-of-range temperatures; if violations exceed 2 %, send an alert and delete the bad rows.
Disable streaming inserts and instead write all Pub/Sub data to Cloud Storage, then load it hourly into BigQuery with load jobs that rely on schema enforcement; examine any load job errors and configure Cloud Functions to page the team when loads fail.
Invoke Cloud Data Loss Prevention (DLP) from Cloud Functions subscribed to Pub/Sub; if DLP finds any sensitive content or invalid temperature values, push the record to a separate Pub/Sub topic and alert via Cloud Tasks.
Dataflow lets you embed custom validation logic inside an Apache Beam pipeline. By adding a ParDo (or schema-aware Validate transform) that checks for a non-null device_id and a valid temperature range, you can emit two side outputs: one for valid events and one for invalid events. Writing the invalid side output to a dedicated BigQuery table (or Cloud Storage bucket) preserves the bad records for later inspection, while sending the valid stream on to the main BigQuery table keeps dashboards current. Within the same ParDo you can increment a custom Counter for each rejected record. Dataflow automatically exports user Counters to Cloud Monitoring, where you create an alerting policy that fires if the percentage of invalid records rises above the 2 % threshold. This approach provides in-stream validation, prevents pipeline failures by isolating bad data, stores failures for auditing, and uses managed monitoring without introducing extra services or batch delays.
The alternative answers fall short:
Relying on BigQuery load errors delegates validation to the warehouse but forces periodic bulk loads (or frequent insert errors) and offers limited real-time alerting.
Using Dataform assertions after loading means invalid data can pollute production dashboards until the next batch run and does not stop bad records proactively.
Running Cloud DLP inspection addresses sensitive data detection, not numeric range or null checks, and adds unnecessary cost and latency.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a ParDo in Apache Beam?
Open an interactive chat with Bash
What is a side output in Dataflow?
Open an interactive chat with Bash
How does Cloud Monitoring integrate with Dataflow for custom alerts?
Open an interactive chat with Bash
What is a ParDo transform in Dataflow?
Open an interactive chat with Bash
How does Cloud Monitoring alerting work with Dataflow Counters?
Open an interactive chat with Bash
What is schema-aware validation in Dataflow?
Open an interactive chat with Bash
GCP Professional Data Engineer
Designing data processing systems
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .