GCP Professional Data Engineer Practice Question

Your manufacturing company collects 150 000 JSON telemetry events per second from thousands of factory devices worldwide. Dashboards in BigQuery must reflect events within 30 seconds of publication. Devices occasionally emit malformed JSON that should be quarantined for later inspection without interrupting ingest. The team wants a fully managed, autoscaling solution that minimizes ongoing operations. Which architecture best satisfies these requirements?

Deploy a long-lived Spark Streaming job on a Dataproc cluster that consumes the Pub/Sub topic, cleans the data, writes to BigQuery, and stores malformed records in an HDFS directory.
Trigger a Cloud Function for each message delivered by a Pub/Sub push subscription and insert the event into BigQuery; wrap the insert in a try/catch block that logs malformed JSON to Cloud Logging.
Have devices write newline-delimited JSON files to Cloud Storage and configure a BigQuery load job every 15 minutes with an error log destination for rows that fail to parse.
Publish events to a Pub/Sub topic that has a dead-letter topic enabled; run an autoscaling Dataflow streaming pipeline that parses the JSON, writes valid rows to BigQuery via the Storage Write API, and routes parsing failures to the dead-letter topic.

Report Issue

Answer Description

Publishing events to a Pub/Sub topic provides a serverless ingestion layer that automatically scales to high throughput. A streaming Dataflow job can subscribe to the topic, parse the JSON, and write valid rows to BigQuery with the BigQuery Storage Write API, making data queryable within seconds. The pipeline can send parsing failures to a Pub/Sub dead-letter topic (or a side output) so bad records are isolated without stopping the job. This combination is fully managed and autoscaling, requiring no cluster maintenance. Cloud Functions would face concurrency limits and per-invocation overhead at 150 000 msg/s, Dataproc introduces cluster administration work and is not serverless, and batch file loads from Cloud Storage cannot meet the sub-minute latency target.

Ask Bash

Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.