AWS Certified Data Engineer Associate DEA-C01 Practice Question
A data engineering team schedules an AWS Glue Spark job through Amazon EventBridge to transform and load daily CSV files from an S3 landing prefix into a partitioned analytics bucket. The job writes with append mode, and Athena reports sometimes reveal duplicate rows for the same day even though the source files are never modified. Which change will most effectively prevent these duplicates while keeping the pipeline fully automated and cost-effective?
Configure an S3 lifecycle rule to delete files in the landing prefix immediately after the job finishes.
Enable AWS Glue job bookmarks so the job automatically ignores files it has already processed.
Change the Spark write operation to overwrite the existing date partition each day.
Add an AWS Step Functions state machine that calls Athena to delete duplicate records after each load completes.
AWS Glue job bookmarks record the state of the input data that a job has already processed. When bookmarks are enabled, the job automatically skips files it has successfully loaded in earlier runs, preventing the same records from being appended twice. Overwrite mode could remove duplicates but risks data loss if the job fails midway and is less efficient. Deleting landing files with a lifecycle rule still lets duplicates through if the job re-reads already copied data before deletion, and it provides no guard against partial reruns. Adding a Step Functions task to run an Athena DELETE query introduces extra cost and complexity and only corrects duplicates after they occur rather than preventing them.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What are AWS Glue job bookmarks?
Open an interactive chat with Bash
What is the difference between append mode and overwrite mode in Spark jobs?
Open an interactive chat with Bash
How does Amazon EventBridge automate workflow scheduling?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Operations and Support
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .