🔥 40% Off Crucial Exams Memberships — Deal ends today!

1 hour, 52 minutes remaining!

GCP Professional Data Engineer Practice Question

A fintech company receives a CSV file with millions of transactions every night in the Cloud Storage folder gs://txn-ingress/daily/. The required workflow is:

  1. Detect when the file for a given day arrives.
  2. Launch an existing Dataflow Flex Template that cleanses and enriches the file and writes the results to a BigQuery table. Downstream tasks must wait until this job finishes.
  3. After the Dataflow job succeeds, start a Cloud DLP inspection job on the new BigQuery table.
  4. If the DLP job reports any HIGH-risk findings, invoke a Cloud Function that masks the affected columns; otherwise end the DAG. Non-functional constraints:
  • Orchestration must be implemented in Cloud Composer with minimal operational overhead.
  • Transient failures must be retried with exponential back-off.
  • Only one run of this DAG is allowed per calendar day, regardless of how many files arrive.

Which Cloud Composer design best satisfies all requirements?

  • Replace Cloud Composer with Cloud Workflows triggered by Cloud Storage events; sequence the Dataflow, DLP, and Cloud Function calls in YAML definitions using built-in retries.

  • Deploy a Cloud Composer 1 environment, poll Pub/Sub notifications with PubSubPullSensor, launch the Dataflow job via a BashOperator running gcloud commands, and create the DLP job with a BigQueryOperator, while keeping default concurrency settings.

  • Use Cloud Scheduler to trigger the Dataflow template and have a Cloud Function start the DLP job; only invoke Cloud Composer when masking is required, relying on default retries and allowing multiple overlapping DAG runs.

  • Define the DAG in a Cloud Composer 2 environment. Add a GoogleCloudStoragePrefixSensor for gs://txn-ingress/daily/, call DataflowStartFlexTemplateOperator with wait_until_finished=True, then CloudDLPCreateJobOperator. Use a BranchPythonOperator to check DLP findings and, if any HIGH-risk results are found, invoke a Cloud Function via GoogleCloudFunctionsOperator; otherwise finish the DAG. Configure max_active_runs=1 and retry_exponential_backoff=True in default_args.

GCP Professional Data Engineer
Ingesting and processing the data
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot