Your company is designing a streaming data pipeline on Google Cloud for a video-streaming platform. Mobile devices publish about 500 000 events per second. A personalization microservice must query the most recent events with single-digit millisecond latency, while the marketing team needs interactive SQL access to 13 months of raw events scanning terabytes daily. Operations staff want a managed, serverless solution and require that processing VMs have no public IP addresses. Which architecture best satisfies these goals?
Ingest events with Pub/Sub and use a single Dataflow job to write all data to time-partitioned BigQuery tables; have the personalization service query BigQuery through BI Engine; disable public IPs on Dataflow workers.
Use Pub/Sub for ingestion; run Dataflow workers without public IPs to branch the stream, writing concurrently to Cloud Bigtable for the personalization microservice and to partitioned BigQuery tables for marketing analytics.
Deploy Apache Kafka on Compute Engine and process the stream with Spark Streaming on Dataproc, persisting events in Cloud SQL for both personalization and marketing queries.
Ingest with Pub/Sub and have Dataflow write Avro files to Cloud Storage; marketers query the files with Presto on Dataproc, and the personalization service retrieves recent files via Cloud Storage through Cloud CDN.
Cloud Bigtable is optimized for very low-latency, high-throughput point lookups and is therefore suited for the personalization service that needs millisecond access to the most recent events. BigQuery provides serverless, petabyte-scale, ANSI-SQL analytics that fits the marketing team's interactive ad-hoc queries over 13 months of data. A single Dataflow streaming job can read from Pub/Sub and branch the stream to multiple sinks, writing each record both to Bigtable and to a partitioned BigQuery table. Dataflow workers can be configured to run without public IP addresses inside a VPC, meeting the networking requirement. The other options fail one or more requirements: BigQuery alone cannot serve sub-second lookups; Cloud SQL cannot sustain the scale; Cloud Storage plus CDN cannot meet millisecond latency and is not serverless for analytics.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Cloud Bigtable and why is it suitable for the personalization service?
Open an interactive chat with Bash
How does Dataflow support branching streams to multiple sinks?
Open an interactive chat with Bash
Why is BigQuery suitable for marketing analytics and how do time-partitioned tables help?
Open an interactive chat with Bash
What is Cloud Bigtable, and why is it suitable for low-latency queries in the given scenario?
Open an interactive chat with Bash
How does Dataflow split streaming data to write to multiple destinations concurrently?
Open an interactive chat with Bash
Why is it important to disable public IPs on processing VMs in this architecture?
Open an interactive chat with Bash
GCP Professional Data Engineer
Ingesting and processing the data
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99 $11.99
$11.99/mo
Billed monthly, Cancel any time.
$19.99 after promotion ends
3 Month Pass
$44.99 $26.99
$8.99/mo
One time purchase of $26.99, Does not auto-renew.
$44.99 after promotion ends
Save $18!
MOST POPULAR
Annual Pass
$119.99 $71.99
$5.99/mo
One time purchase of $71.99, Does not auto-renew.
$119.99 after promotion ends
Save $48!
BEST DEAL
Lifetime Pass
$189.99 $113.99
One time purchase, Good for life.
Save $76!
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .