GCP Professional Data Engineer Practice Question

A healthcare analytics provider stores daily CSV exports of patient monitoring metrics in a Google Cloud Storage bucket. New files are added around 02:00 UTC and are never modified after creation. Business analysts query the data in BigQuery and require the previous day's data to be available by 06:00 UTC. Operations wants a single managed service that can schedule a daily load, detect only newly arrived objects, and avoid the need for custom code or customer-managed VMs. Which design meets these requirements?

Configure BigQuery Data Transfer Service with the Cloud Storage connector to run daily at 03:00 UTC and load the files from the bucket prefix into a date-partitioned BigQuery table.
Build a Dataflow batch pipeline orchestrated by Cloud Composer that processes new objects from the bucket and writes them to BigQuery.
Create a Cloud Scheduler job that invokes a Cloud Function each morning to run a bq load command for all objects in the bucket.
Ship the bucket's contents each night on a Transfer Appliance and load the files into BigQuery after arrival.

Report Issue

Answer Description

BigQuery Data Transfer Service offers a native Cloud Storage connector that can be scheduled at a specific UTC time. For each run, the service keeps track of which objects it has already processed and loads only the new files that match the defined path or prefix, automatically appending them to the correct partition of the destination table. Because BQ DTS is serverless and fully managed, no customer-managed virtual machines or custom scripts are required. Other options either require additional services, custom code, or violate the no-VM constraint.

Ask Bash

Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.