GCP Professional Data Engineer Practice Question

Your team receives hourly Avro files compressed with gzip in Cloud Storage paths like gs://raw-bucket/dt=YYYYMMDD-HH/*.avro.gz. About 100 GB arrives per hour, and analysts must query the data in BigQuery within 15 minutes. Additional requirements: keep ingestion cost as low as possible, load each hour atomically (all or nothing), automatically accept newly added nullable fields, and partition the target table by ingestion time. Which approach best meets the requirements?

  • Build a batch Dataflow pipeline that reads the Avro files from Cloud Storage and writes the records to BigQuery through the Storage Write API's default stream.

  • Configure a Cloud Storage finalize event to trigger a Cloud Function that starts a BigQuery load job using a wildcard URI for the hour's Avro files, WRITE_APPEND disposition, schema_update_options = ALLOW_FIELD_ADDITION, and ingestion-time partitioning.

  • Create a permanent external table over the Avro objects and run scheduled queries each hour to copy data into a destination table.

  • Schedule hourly Dataproc Spark jobs in Cloud Composer to parse the Avro files and insert the rows into BigQuery using the legacy streaming insert API.

GCP Professional Data Engineer
Ingesting and processing the data
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot