Your team operates a Google Cloud Dataflow streaming pipeline that enriches click-stream events and writes the results to BigQuery. Last night a zonal outage caused several workers in the job to crash, but the pipeline kept running and automatically recovered once the zone became available again. A junior engineer proposes adding a Cloud Scheduler job that restarts the entire pipeline whenever a worker fails to "guarantee" availability. Considering Dataflow's built-in fault-tolerance features, what is the most appropriate response?
Add custom retry loops inside every DoFn to catch exceptions and pause processing until the failed worker is manually replaced.
Implement the proposed Cloud Scheduler job to shut down and relaunch the pipeline whenever a worker failure log entry appears.
Explain that Dataflow already detects worker loss and transparently spins up replacement workers, so no additional restart logic is required.
Schedule a nightly full pipeline restart to flush any partial state left by failed workers and avoid duplicate output rows.
Dataflow replaces failed workers automatically. In a streaming job, persistent checkpoints and the Streaming Engine keep state outside the worker VMs, so work is reassigned to healthy workers with no need to restart the job. Adding an external restart mechanism would introduce unnecessary downtime and could cause duplicate processing. Therefore, the correct action is to rely on Dataflow's automatic worker replacement and leave the existing job running. The other options either restart the pipeline unnecessarily, ignore state management, or introduce custom logic that does not address worker loss.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
How does Dataflow automatically handle failed workers?
Open an interactive chat with Bash
What is the role of checkpoints and the Streaming Engine in Dataflow?
Open an interactive chat with Bash
Why is manually restarting a Dataflow pipeline during worker failures unnecessary?
Open an interactive chat with Bash
How does Dataflow's fault-tolerance mechanism work?
Open an interactive chat with Bash
What is the purpose of the Streaming Engine in Dataflow?
Open an interactive chat with Bash
Why is a Cloud Scheduler restart job not recommended for failed workers?
Open an interactive chat with Bash
GCP Professional Data Engineer
Maintaining and automating data workloads
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99 $11.99
$11.99/mo
Billed monthly, Cancel any time.
$19.99 after promotion ends
3 Month Pass
$44.99 $26.99
$8.99/mo
One time purchase of $26.99, Does not auto-renew.
$44.99 after promotion ends
Save $18!
MOST POPULAR
Annual Pass
$119.99 $71.99
$5.99/mo
One time purchase of $71.99, Does not auto-renew.
$119.99 after promotion ends
Save $48!
BEST DEAL
Lifetime Pass
$189.99 $113.99
One time purchase, Good for life.
Save $76!
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .