Your manufacturing company collects 150 000 JSON telemetry events per second from thousands of factory devices worldwide. Dashboards in BigQuery must reflect events within 30 seconds of publication. Devices occasionally emit malformed JSON that should be quarantined for later inspection without interrupting ingest. The team wants a fully managed, autoscaling solution that minimizes ongoing operations. Which architecture best satisfies these requirements?
Trigger a Cloud Function for each message delivered by a Pub/Sub push subscription and insert the event into BigQuery; wrap the insert in a try/catch block that logs malformed JSON to Cloud Logging.
Publish events to a Pub/Sub topic that has a dead-letter topic enabled; run an autoscaling Dataflow streaming pipeline that parses the JSON, writes valid rows to BigQuery via the Storage Write API, and routes parsing failures to the dead-letter topic.
Deploy a long-lived Spark Streaming job on a Dataproc cluster that consumes the Pub/Sub topic, cleans the data, writes to BigQuery, and stores malformed records in an HDFS directory.
Have devices write newline-delimited JSON files to Cloud Storage and configure a BigQuery load job every 15 minutes with an error log destination for rows that fail to parse.
Publishing events to a Pub/Sub topic provides a serverless ingestion layer that automatically scales to high throughput. A streaming Dataflow job can subscribe to the topic, parse the JSON, and write valid rows to BigQuery with the BigQuery Storage Write API, making data queryable within seconds. The pipeline can send parsing failures to a Pub/Sub dead-letter topic (or a side output) so bad records are isolated without stopping the job. This combination is fully managed and autoscaling, requiring no cluster maintenance. Cloud Functions would face concurrency limits and per-invocation overhead at 150 000 msg/s, Dataproc introduces cluster administration work and is not serverless, and batch file loads from Cloud Storage cannot meet the sub-minute latency target.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Pub/Sub and why is it suitable for high-throughput data ingestion?
Open an interactive chat with Bash
What is the role of Dataflow in the solution and how does it support autoscaling?
Open an interactive chat with Bash
What is the BigQuery Storage Write API and how does it improve data latency?
Open an interactive chat with Bash
What is Pub/Sub, and why is it ideal for high-throughput messaging?
Open an interactive chat with Bash
What is the BigQuery Storage Write API, and how does it enable fast data querying?
Open an interactive chat with Bash
How does a Dataflow streaming pipeline handle malformed JSON data efficiently?
Open an interactive chat with Bash
GCP Professional Data Engineer
Ingesting and processing the data
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99 $11.99
$11.99/mo
Billed monthly, Cancel any time.
$19.99 after promotion ends
3 Month Pass
$44.99 $26.99
$8.99/mo
One time purchase of $26.99, Does not auto-renew.
$44.99 after promotion ends
Save $18!
MOST POPULAR
Annual Pass
$119.99 $71.99
$5.99/mo
One time purchase of $71.99, Does not auto-renew.
$119.99 after promotion ends
Save $48!
BEST DEAL
Lifetime Pass
$189.99 $113.99
One time purchase, Good for life.
Save $76!
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .