AWS Certified Data Engineer Associate DEA-C01 Practice Question
An AWS Glue ETL job reads daily CSV files from an S3 prefix and loads the data into an Amazon Redshift staging table. Occasionally, the upstream system retransmits an earlier file, causing duplicate rows in the warehouse. The data engineering team wants an automated, low-maintenance way to keep the dataset consistent by skipping any file that has already been processed. Which approach will best meet this requirement?
Configure Amazon S3 server-side encryption (SSE-S3) on the input bucket to prevent duplicate loads.
Enable AWS Glue job bookmarks so the job automatically skips previously processed S3 objects.
Run the crawler less frequently so that earlier files are unlikely to be picked up twice.
Add a Spark dropDuplicates transformation in the ETL script after reading the CSV files.
AWS Glue job bookmarks persist state information about previously processed input files. When bookmarks are enabled, a Glue job automatically ignores S3 objects it has already consumed, providing exactly-once semantics without additional code changes. Encrypting S3 objects, reducing crawler frequency, or simply adding a Spark de-duplication step either do not prevent duplicate ingestion at the source or require unnecessary compute and maintenance overhead.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
How do AWS Glue job bookmarks work?
Open an interactive chat with Bash
What are the performance benefits of enabling Glue job bookmarks?
Open an interactive chat with Bash
Why is Spark dropDuplicates transformation not ideal in this scenario?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Operations and Support
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .