Your e-commerce company ingests tens of thousands of click-events per second into Pub/Sub. Data engineers must build a pipeline that consumes the stream in real time, performs sliding-window aggregations, and writes the results to BigQuery. When business logic changes, the same code should be rerun in batch to reprocess a full day of raw event files stored in Cloud Storage. The team wants a fully managed, auto-scaling service that lets them implement the pipeline once in Python without having to create or manage clusters. Which Google Cloud service best satisfies these requirements?
Managed Spark Structured Streaming jobs on Cloud Dataproc clusters
Cloud Dataflow with the Apache Beam SDK
A Cloud Composer DAG orchestrating BigQuery SQL transformation jobs
A Cloud Data Fusion pipeline triggered by Pub/Sub and writing to BigQuery
Cloud Dataflow is a serverless, fully managed execution service for Apache Beam pipelines. Beam's unified model lets developers write a single pipeline in Python (or Java, Go) that can be executed in both streaming and batch modes without code changes. Dataflow automatically provisions and scales worker resources, eliminating the need to create or manage clusters. Dataproc requires you to provision and manage Spark clusters and typically separates streaming and batch jobs. Cloud Data Fusion provides visual ETL but still spins up underlying clusters and does not let you write one Beam Python pipeline for both modes. Cloud Composer is an orchestrator, not a data-processing engine, and would still need another service to run the actual transformations. Therefore, Cloud Dataflow best meets all stated requirements.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Cloud Dataflow and why is it suited for this use case?
Open an interactive chat with Bash
What is Apache Beam, and how does it enable both batch and streaming data processing?
Open an interactive chat with Bash
Why are Managed Spark jobs on Cloud Dataproc not ideal for this requirement?
Open an interactive chat with Bash
What is Cloud Dataflow and how does it work?
Open an interactive chat with Bash
What are sliding window aggregations in streaming pipelines?
Open an interactive chat with Bash
Why is Apache Beam a good choice for building pipelines on Google Cloud?
Open an interactive chat with Bash
GCP Professional Data Engineer
Ingesting and processing the data
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99 $11.99
$11.99/mo
Billed monthly, Cancel any time.
$19.99 after promotion ends
3 Month Pass
$44.99 $26.99
$8.99/mo
One time purchase of $26.99, Does not auto-renew.
$44.99 after promotion ends
Save $18!
MOST POPULAR
Annual Pass
$119.99 $71.99
$5.99/mo
One time purchase of $71.99, Does not auto-renew.
$119.99 after promotion ends
Save $48!
BEST DEAL
Lifetime Pass
$189.99 $113.99
One time purchase, Good for life.
Save $76!
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .