Your organization receives 4 TB of JSON telemetry each day from hundreds of thousands of IoT devices. The events must be filtered for malformed records, deduplicated by device-timestamp, enriched with a 100 MB reference table that updates weekly, and streamed into partitioned BigQuery tables with sub-minute latency. Data engineers also need to rerun the identical logic over months of archived data for occasional backfills. Operations teams require automatic horizontal scaling and no cluster management. Which Google Cloud solution best satisfies all requirements?
Load raw events directly into BigQuery with streaming inserts and use Dataform SQL models to cleanse, deduplicate, and enrich data in place.
Implement an Apache Beam pipeline on Cloud Dataflow that streams events, uses a weekly-refreshed side input for enrichment, and can be re-run in batch for backfills.
Create a Cloud Data Fusion pipeline with Wrangler transforms, triggered by Cloud Composer, and manually scale the underlying Dataproc cluster for spikes.
Run Spark Streaming on a long-running Cloud Dataproc cluster, coupled with scheduled Dataprep jobs for cleaning and a separate batch Spark job for backfills.
Cloud Dataflow executes Apache Beam pipelines that run in either true streaming or batch mode from the same code base. It supports user-defined transforms for complex cleansing and deduplication, side inputs for periodically refreshed lookup tables, exactly-once BigQuery sinks, and automatic horizontal scaling without any cluster to manage. Dataproc would require cluster provisioning and separate batch/stream jobs; Cloud Data Fusion and Dataform focus on GUI or SQL-based ETL and are not ideal for high-volume, low-latency streaming with code reuse across batch and streaming workloads.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Apache Beam and why is it suitable for this use case?
Open an interactive chat with Bash
What are side inputs and how do they work in Cloud Dataflow?
Open an interactive chat with Bash
How does Cloud Dataflow ensure automatic horizontal scaling?
Open an interactive chat with Bash
Why is Apache Beam suitable for both streaming and batch processing in Cloud Dataflow?
Open an interactive chat with Bash
What is a side input in Apache Beam, and how does it enable enrichment?
Open an interactive chat with Bash
How does Cloud Dataflow ensure automatic horizontal scaling?
Open an interactive chat with Bash
GCP Professional Data Engineer
Designing data processing systems
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99 $11.99
$11.99/mo
Billed monthly, Cancel any time.
$19.99 after promotion ends
3 Month Pass
$44.99 $26.99
$8.99/mo
One time purchase of $26.99, Does not auto-renew.
$44.99 after promotion ends
Save $18!
MOST POPULAR
Annual Pass
$119.99 $71.99
$5.99/mo
One time purchase of $71.99, Does not auto-renew.
$119.99 after promotion ends
Save $48!
BEST DEAL
Lifetime Pass
$189.99 $113.99
One time purchase, Good for life.
Save $76!
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .