A fintech company receives a CSV file with millions of transactions every night in the Cloud Storage folder gs://txn-ingress/daily/. The required workflow is:
Detect when the file for a given day arrives.
Launch an existing Dataflow Flex Template that cleanses and enriches the file and writes the results to a BigQuery table. Downstream tasks must wait until this job finishes.
After the Dataflow job succeeds, start a Cloud DLP inspection job on the new BigQuery table.
If the DLP job reports any HIGH-risk findings, invoke a Cloud Function that masks the affected columns; otherwise end the DAG. Non-functional constraints:
Orchestration must be implemented in Cloud Composer with minimal operational overhead.
Transient failures must be retried with exponential back-off.
Only one run of this DAG is allowed per calendar day, regardless of how many files arrive.
Which Cloud Composer design best satisfies all requirements?
Deploy a Cloud Composer 1 environment, poll Pub/Sub notifications with PubSubPullSensor, launch the Dataflow job via a BashOperator running gcloud commands, and create the DLP job with a BigQueryOperator, while keeping default concurrency settings.
Use Cloud Scheduler to trigger the Dataflow template and have a Cloud Function start the DLP job; only invoke Cloud Composer when masking is required, relying on default retries and allowing multiple overlapping DAG runs.
Define the DAG in a Cloud Composer 2 environment. Add a GoogleCloudStoragePrefixSensor for gs://txn-ingress/daily/, call DataflowStartFlexTemplateOperator with wait_until_finished=True, then CloudDLPCreateJobOperator. Use a BranchPythonOperator to check DLP findings and, if any HIGH-risk results are found, invoke a Cloud Function via GoogleCloudFunctionsOperator; otherwise finish the DAG. Configure max_active_runs=1 and retry_exponential_backoff=True in default_args.
Replace Cloud Composer with Cloud Workflows triggered by Cloud Storage events; sequence the Dataflow, DLP, and Cloud Function calls in YAML definitions using built-in retries.
Implement the DAG in Cloud Composer 2, which runs on GKE Autopilot to minimize operational management. Use GoogleCloudStoragePrefixSensor (or GoogleCloudStorageObjectSensor) to wait for the nightly CSV file in gs://txn-ingress/daily/. Launch the Dataflow Flex Template with DataflowStartFlexTemplateOperator and set wait_until_finished=True so that downstream tasks block until the job completes successfully. After completion, create a DLP inspection job with CloudDLPCreateJobOperator. A BranchPythonOperator can examine the DLP findings and determine whether to call the CloudFunctionsInvokeFunctionOperator that triggers the masking Cloud Function or to end the workflow. Setting max_active_runs=1 in the DAG ensures only one run per calendar day, and retry_exponential_backoff=True on tasks enables exponential back-off for retries. Alternative designs that rely on separate services for orchestration or that use Cloud Composer 1 with custom Bash commands either increase operational overhead or fail to meet all requirements.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Cloud Composer 2 and how does it differ from Cloud Composer 1?
Open an interactive chat with Bash
How does GoogleCloudStoragePrefixSensor work in Cloud Composer?
Open an interactive chat with Bash
What is the role of BranchPythonOperator in the described workflow?
Open an interactive chat with Bash
What is Cloud Composer, and how does it differ from other orchestration tools?
Open an interactive chat with Bash
What is a Dataflow Flex Template, and why is it used here?
Open an interactive chat with Bash
What does Cloud DLP do, and why is it critical in this workflow?
Open an interactive chat with Bash
GCP Professional Data Engineer
Ingesting and processing the data
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99 $11.99
$11.99/mo
Billed monthly, Cancel any time.
$19.99 after promotion ends
3 Month Pass
$44.99 $26.99
$8.99/mo
One time purchase of $26.99, Does not auto-renew.
$44.99 after promotion ends
Save $18!
MOST POPULAR
Annual Pass
$119.99 $71.99
$5.99/mo
One time purchase of $71.99, Does not auto-renew.
$119.99 after promotion ends
Save $48!
BEST DEAL
Lifetime Pass
$189.99 $113.99
One time purchase, Good for life.
Save $76!
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .