An online media company is rebuilding its click-stream ingestion pipeline on Google Cloud. About 80 000 JSON events per second are published from mobile devices to a Cloud Pub/Sub topic. A personalization microservice must be able to look up the latest events for any given user ID with single-digit millisecond latency for up to seven days after ingestion. Data scientists will also run monthly aggregations on a full year of clickstream history in BigQuery. Which design for the initial sink that subscribes to Pub/Sub best meets these requirements while keeping the architecture simple and cost-efficient?
Write each message to BigQuery using streaming inserts into partitioned tables, and let the microservice query BigQuery directly for recent events.
Use a Dataflow pipeline to write events as Avro files to Cloud Storage and create external tables in BigQuery over the bucket for analytics.
Persist events in Cloud Bigtable using the user ID as the row key, then export the table daily to Cloud Storage and batch-load the files into BigQuery.
Trigger Cloud Functions for each Pub/Sub message to insert the event into Cloud SQL, and configure federated queries from BigQuery to Cloud SQL for analytics.
Cloud Bigtable is optimized for high-throughput writes from streaming sources such as Pub/Sub and provides consistent single-digit millisecond latency for key-based lookups, which satisfies the personalization service. Retaining one week of data in Bigtable keeps hot data available for low-latency access. You can then run a scheduled Dataflow job (or Bigtable export) that writes older data to Cloud Storage and loads it into BigQuery, enabling cost-efficient long-term analytical queries without burdening Bigtable.
Directly streaming into BigQuery would allow analytics but would not deliver the required point-read latency and could become expensive at 80 000 rows per second. Writing to Cloud Storage first would not support low-latency reads. Cloud SQL cannot ingest at this scale and would not provide horizontal scalability or the required latency. Therefore, persisting events in Cloud Bigtable and exporting them periodically to BigQuery is the most appropriate choice.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Cloud Bigtable, and why is it suitable for low-latency key-based lookups?
Open an interactive chat with Bash
How does exporting data from Cloud Bigtable to BigQuery for analytics work?
Open an interactive chat with Bash
Why can’t BigQuery or Cloud SQL handle both low-latency and high-throughput requirements in this scenario?
Open an interactive chat with Bash
What is Cloud Bigtable and why is it suited for this use case?
Open an interactive chat with Bash
Why is BigQuery not suitable for real-time, low-latency lookups?
Open an interactive chat with Bash
What is the role of Pub/Sub in this architecture?
Open an interactive chat with Bash
GCP Professional Data Engineer
Ingesting and processing the data
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .