AWS Certified Data Engineer Associate DEA-C01 Practice Question

An AWS Glue ETL job reads daily CSV files from an S3 prefix and loads the data into an Amazon Redshift staging table. Occasionally, the upstream system retransmits an earlier file, causing duplicate rows in the warehouse. The data engineering team wants an automated, low-maintenance way to keep the dataset consistent by skipping any file that has already been processed. Which approach will best meet this requirement?

Enable AWS Glue job bookmarks so the job automatically skips previously processed S3 objects.
Run the crawler less frequently so that earlier files are unlikely to be picked up twice.
Configure Amazon S3 server-side encryption (SSE-S3) on the input bucket to prevent duplicate loads.
Add a Spark dropDuplicates transformation in the ETL script after reading the CSV files.

AWS Certified Data Engineer Associate DEA-C01

Data Operations and Support

Your Score:

Bash, the Crucial Exams Chat Bot

AI Bot

AWS Certified Data Engineer Associate DEA-C01 Practice Question

Answer Description

Ask Bash

How do AWS Glue job bookmarks work?

What are the performance benefits of enabling Glue job bookmarks?

Why is Spark dropDuplicates transformation not ideal in this scenario?

Monthly

$19.99 $11.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99 $26.99

One time purchase of $26.99,
Does not auto-renew.

Annual Pass

$119.99 $71.99

One time purchase of $71.99,
Does not auto-renew.

Lifetime Pass

$189.99 $113.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

AWS Certified Data Engineer Associate DEA-C01 Practice Question

Report Issue

Answer Description

Ask Bash

How do AWS Glue job bookmarks work?

What are the performance benefits of enabling Glue job bookmarks?

Why is Spark dropDuplicates transformation not ideal in this scenario?

Report Issue