AWS Certified Data Engineer Associate DEA-C01 Practice Question
A data engineering team needs to build a near-real-time pipeline that reads JSON events from Amazon Kinesis Data Streams, transforms the data with an AWS Glue 4.0 streaming job, and stores the output as partitioned Apache Parquet files in Amazon S3. The solution must minimize duplicate record processing and allow the job to restart automatically from the exact position where it stopped after any failure or worker re-deployment. Which approach will satisfy these requirements?
Enable continuous logging and set the job's Spark log level to INFO so Glue can replay uncommitted batches after a restart.
Specify an Amazon S3 checkpoint location for the Glue streaming job so Spark Structured Streaming can persist stream offsets.
Set the Spark streaming windowSize parameter to 100 seconds so the job can reprocess only the last window when it restarts.
Turn on AWS Glue Auto Scaling to keep at least one idle DPU worker, ensuring the stream position is maintained during failures.
AWS Glue streaming jobs run on Apache Spark Structured Streaming. To achieve exactly-once semantics across restarts, Spark must persist stream offsets. The recommended way is to specify an Amazon S3 path as the checkpoint location (using the --checkpoint-location job parameter or the checkpointLocation option in the writeStream call). Spark writes offset metadata to this directory, so when the Glue job restarts it automatically reads the offsets and resumes processing from the last committed position, preventing duplicate or lost events. Continuous logging, auto scaling, and window size tuning can improve observability, cost, or latency, but none of them alone guarantees recovery of the stream position after a failure.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Amazon Kinesis Data Streams?
Open an interactive chat with Bash
What is the role of Amazon S3 checkpointing in streaming jobs?
Open an interactive chat with Bash
Why does Spark Structured Streaming require --checkpoint-location?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Ingestion and Transformation
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .