GCP Professional Data Engineer Practice Question

Your manufacturing company collects 150 000 JSON telemetry events per second from thousands of factory devices worldwide. Dashboards in BigQuery must reflect events within 30 seconds of publication. Devices occasionally emit malformed JSON that should be quarantined for later inspection without interrupting ingest. The team wants a fully managed, autoscaling solution that minimizes ongoing operations. Which architecture best satisfies these requirements?

  • Deploy a long-lived Spark Streaming job on a Dataproc cluster that consumes the Pub/Sub topic, cleans the data, writes to BigQuery, and stores malformed records in an HDFS directory.

  • Trigger a Cloud Function for each message delivered by a Pub/Sub push subscription and insert the event into BigQuery; wrap the insert in a try/catch block that logs malformed JSON to Cloud Logging.

  • Have devices write newline-delimited JSON files to Cloud Storage and configure a BigQuery load job every 15 minutes with an error log destination for rows that fail to parse.

  • Publish events to a Pub/Sub topic that has a dead-letter topic enabled; run an autoscaling Dataflow streaming pipeline that parses the JSON, writes valid rows to BigQuery via the Storage Write API, and routes parsing failures to the dead-letter topic.

GCP Professional Data Engineer
Ingesting and processing the data
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot