GCP Professional Data Engineer Practice Question

Your company ingests real-time purchase events through a Cloud Pub/Sub topic, processes them with Dataflow, and writes the results to BigQuery. Leadership asks you to spin up a separate development project where engineers can test new transformation logic under production-like message volume, but no real customer PII may appear in the development environment. You must keep the production pipeline untouched and minimise engineering effort. Which approach best meets these requirements?

  • Use the Dataflow Streaming Data Generator template to publish synthetic purchase events, matching the production schema and rate, to a separate Pub/Sub topic that feeds the development pipeline.

  • Define a Pub/Sub schema with masked PII fields and let developers consume the same production topic in their project.

  • Export production BigQuery tables to Cloud Storage every hour, run a Dataflow job to redact PII, then load the cleansed files into a development BigQuery dataset for testing.

  • Create a second subscription on the production Pub/Sub topic that samples 1 % of messages and forwards them to the development pipeline.

GCP Professional Data Engineer
Designing data processing systems
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot