GCP Professional Data Engineer Practice Question

An online marketplace receives approximately 150 000 JSON click-stream events per second (about 10 TB/day). Analysts need sub-second, ad-hoc SQL analytics on two years of data, while data scientists occasionally train TensorFlow models that read the same dataset. The engineering team wants a fully managed, low-ops solution that automatically provides cheaper rates for rarely accessed historical data. Which Google Cloud storage approach best satisfies these requirements?

  • Write the events to Cloud Bigtable and export snapshots to Cloud Storage when analysts need to query historical data with Dataproc.

  • Ingest the events into partitioned tables in Cloud SQL for PostgreSQL and connect Data Studio for analyst queries.

  • Persist the raw JSON files in Cloud Storage Nearline and query them through Dataproc Spark SQL jobs scheduled by Cloud Composer.

  • Stream the events to BigQuery using the Storage Write API, store them in a time-partitioned and clustered native table, and rely on BigQuery's automatic long-term storage pricing for older partitions.

GCP Professional Data Engineer
Storing the data
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot