Your media company ingests millions of click events per second into Pub/Sub. Design a pipeline that:
Groups events into user sessions separated by 30 minutes of inactivity and writes the enriched session records to BigQuery with sub-minute latency.
After each UTC day ends, executes a dependent SQL job that aggregates the previous day's sessions into a summary table. You want minimal infrastructure management and a clear separation between the real-time transformation and the daily batch step. Which solution meets these requirements?
Create a streaming Apache Beam pipeline on Dataflow that applies 30-minute session windows and writes to BigQuery via the Storage Write API; orchestrate a Cloud Composer DAG that, once the Dataflow watermark passes midnight UTC, triggers a BigQuery SQL job to build the daily summary table.
Invoke Cloud Functions for every Pub/Sub event to write raw clicks to Bigtable; configure a daily BigQuery Data Transfer from Bigtable at 00:00 UTC and rely on materialized views for real-time sessionization.
Run a long-lived Spark Streaming job on Dataproc to perform sessionization and write results to Cloud Storage; use Cloud Scheduler to invoke a Cloud Function at 00:10 UTC that runs a BigQuery aggregation query.
Use Cloud Data Fusion in batch mode to read from Pub/Sub, apply sessionization with Wrangler transformations, and load sessions into Cloud SQL; call a Workflows orchestration each night to aggregate into BigQuery.
A serverless Apache Beam pipeline on Dataflow can apply 30-minute session windows to streaming Pub/Sub events and write the results to BigQuery through the BigQuery Storage Write API, giving near-real-time availability for analysts. Cloud Composer (managed Airflow) provides a fully managed orchestration layer that can wait for the streaming job's watermark to pass midnight and then launch the dependent BigQuery SQL task to build the daily summary table. This approach satisfies both transformation and orchestration needs with the least operational overhead.
The Dataproc option requires cluster provisioning and maintenance and uses Cloud Scheduler plus Cloud Functions, which lack built-in dependency management. The Cloud Data Fusion batch design fails to meet the sub-minute latency requirement. The Bigtable design cannot perform real-time sessionization in BigQuery and BigQuery Data Transfer Service does not pull data from Bigtable, so it does not satisfy either requirement.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
ELI5: What is Apache Beam?
Open an interactive chat with Bash
Why is Dataflow and the Storage Write API suitable for the pipeline?
Open an interactive chat with Bash
What is the role of Cloud Composer in this solution?
Open an interactive chat with Bash
What is Apache Beam and why is it used in this solution?
Open an interactive chat with Bash
What is the role of Cloud Composer in this setup?
Open an interactive chat with Bash
How does the BigQuery Storage Write API enable real-time data processing?
Open an interactive chat with Bash
What is Apache Beam, and why is it suitable for this solution?
Open an interactive chat with Bash
What is a watermark in Dataflow, and why is it relevant for this use case?
Open an interactive chat with Bash
How does the BigQuery Storage Write API help achieve sub-minute latency?
Open an interactive chat with Bash
GCP Professional Data Engineer
Ingesting and processing the data
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .