GCP Professional Data Engineer Practice Question

Your company streams purchase events from Pub/Sub into BigQuery for near real-time dashboards. Compliance requires that any primary account number (PAN) in the field card_number is tokenized before it is written. Business analysts also need each record to contain a non-empty order_id and want duplicate order_id values to be discarded if they arrive again within 24 hours. You must keep end-to-end latency below five seconds and avoid managing cluster infrastructure. Which design should you implement to satisfy the cleansing requirements?

  • Deploy a long-running Dataproc Spark Streaming job that calls Cloud DLP for tokenization, removes duplicate order_id values, stores Parquet files in Cloud Storage, and triggers a BigQuery load job every hour.

  • Use the Pub/Sub to BigQuery streaming template without modification and rely on BigQuery policy tags to mask the card_number column, accepting all rows and deduplicating later with a nightly BigQuery MERGE job.

  • Build a streaming Dataflow pipeline that invokes Cloud DLP to tokenize card_number, filters out events with a null order_id, applies a 24-hour windowed Distinct on order_id, and writes the cleansed stream to BigQuery via the Storage Write API.

  • Create an hourly Cloud Data Fusion batch pipeline that pulls messages from Pub/Sub, uses the built-in Cloud DLP plugin to tokenize card_number, deduplicates on order_id, and then loads the result into BigQuery.

GCP Professional Data Engineer
Ingesting and processing the data
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot