Your company ingests real-time purchase events through a Cloud Pub/Sub topic, processes them with Dataflow, and writes the results to BigQuery. Leadership asks you to spin up a separate development project where engineers can test new transformation logic under production-like message volume, but no real customer PII may appear in the development environment. You must keep the production pipeline untouched and minimise engineering effort. Which approach best meets these requirements?
Use the Dataflow Streaming Data Generator template to publish synthetic purchase events, matching the production schema and rate, to a separate Pub/Sub topic that feeds the development pipeline.
Create a second subscription on the production Pub/Sub topic that samples 1 % of messages and forwards them to the development pipeline.
Define a Pub/Sub schema with masked PII fields and let developers consume the same production topic in their project.
Export production BigQuery tables to Cloud Storage every hour, run a Dataflow job to redact PII, then load the cleansed files into a development BigQuery dataset for testing.
Using Dataflow's Streaming Data Generator template to publish synthetic events to a dedicated development Pub/Sub topic satisfies all constraints. It produces realistic, production-sized traffic that matches the original schema, so performance testing is meaningful, while the data are completely synthetic and contain no real PII. The production topic and pipeline remain unchanged.
Sampling or creating an additional subscription to the production topic still transmits real customer data to development. Cloud Pub/Sub does not provide built-in field-level masking, so simply configuring a schema or filter cannot guarantee PII removal. Periodically exporting, anonymising, and re-importing production data into BigQuery would introduce delay and extra maintenance, and would not provide a continuous real-time stream for performance testing.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the Dataflow Streaming Data Generator?
Open an interactive chat with Bash
Why can't Pub/Sub schemas mask PII effectively?
Open an interactive chat with Bash
How does the Streaming Data Generator ensure privacy?
Open an interactive chat with Bash
GCP Professional Data Engineer
Designing data processing systems
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .