GCP Professional Data Engineer Practice Question

Your retail company runs a nightly Dataflow batch job that loads 3 TB of CSV files from a Cloud Storage bucket in europe-west1 into BigQuery. Corporate policy mandates that if the primary region becomes unavailable the pipeline must be restarted in another region within 30 minutes, without recompiling or rebuilding the code. Which design best satisfies this disaster-recovery objective while aligning with Google-recommended practices?

Store the pipeline as a Dataflow Flex Template in a multi-regional Cloud Storage bucket, monitor the job with Cloud Monitoring, and trigger a Cloud Function to launch the same template in europe-west4 when the primary job fails.
Redesign the pipeline to use Cloud Spanner for state management; Spanner's multi-region replication will allow the existing job to keep running even if europe-west1 fails.
Start the job with the --automaticFailover flag so Dataflow transparently restarts the pipeline in the nearest healthy region during an outage.
Convert the batch job to a streaming pipeline, enable hourly Dataflow snapshots, and restore the latest snapshot in europe-west4 if europe-west1 becomes unavailable.

Report Issue

Answer Description

Packaging the pipeline as a Dataflow Flex Template stores an executable Docker image and JSON specification in Cloud Storage. Because templates are region-agnostic, the same image can be launched in any Dataflow-supported region simply by specifying a different --region parameter. By placing the template artifacts in a multi-regional bucket they remain accessible even if europe-west1 is down. Cloud Monitoring can watch the running job's state and emit an alert when it enters a Failed or Cancelled state; a Cloud Function (or Cloud Run service) triggered by that alert can call the Dataflow REST API to start a new job from the same template in europe-west4, meeting the 30-minute RTO without human compilation work.

The other options rely on features that do not exist (automatic region fail-over flag), misuse snapshots (which are limited to streaming pipelines and cannot be restored across regions), or depend on an unrelated service's replication (Cloud Spanner) rather than enabling rapid redeployment of the Dataflow job itself.

Ask Bash

Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.