AWS Certified Data Engineer Associate DEA-C01 Practice Question

An AWS Glue ETL job reads daily CSV files from an S3 prefix and loads the data into an Amazon Redshift staging table. Occasionally, the upstream system retransmits an earlier file, causing duplicate rows in the warehouse. The data engineering team wants an automated, low-maintenance way to keep the dataset consistent by skipping any file that has already been processed. Which approach will best meet this requirement?

  • Configure Amazon S3 server-side encryption (SSE-S3) on the input bucket to prevent duplicate loads.

  • Enable AWS Glue job bookmarks so the job automatically skips previously processed S3 objects.

  • Run the crawler less frequently so that earlier files are unlikely to be picked up twice.

  • Add a Spark dropDuplicates transformation in the ETL script after reading the CSV files.

AWS Certified Data Engineer Associate DEA-C01
Data Operations and Support
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot