GCP Professional Data Engineer Practice Question

An online media company is rebuilding its click-stream ingestion pipeline on Google Cloud. About 80 000 JSON events per second are published from mobile devices to a Cloud Pub/Sub topic. A personalization microservice must be able to look up the latest events for any given user ID with single-digit millisecond latency for up to seven days after ingestion. Data scientists will also run monthly aggregations on a full year of clickstream history in BigQuery. Which design for the initial sink that subscribes to Pub/Sub best meets these requirements while keeping the architecture simple and cost-efficient?

  • Write each message to BigQuery using streaming inserts into partitioned tables, and let the microservice query BigQuery directly for recent events.

  • Use a Dataflow pipeline to write events as Avro files to Cloud Storage and create external tables in BigQuery over the bucket for analytics.

  • Persist events in Cloud Bigtable using the user ID as the row key, then export the table daily to Cloud Storage and batch-load the files into BigQuery.

  • Trigger Cloud Functions for each Pub/Sub message to insert the event into Cloud SQL, and configure federated queries from BigQuery to Cloud SQL for analytics.

GCP Professional Data Engineer
Ingesting and processing the data
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot