AWS Certified Data Engineer Associate DEA-C01 Practice Question
Your company ingests daily CSV files into an S3 data lake, runs an AWS Glue Spark job to denormalize the data, and then loads the result into an Amazon Redshift table that has a primary key on order_id. Duplicate order_id values occasionally appear in the source data and cause the Redshift load to fail. You must add an automated step to the existing Glue workflow that verifies the order_id column contains only unique values and stops the workflow if the rule is violated. Which approach satisfies these requirements with minimal custom code?
Create an AWS Glue Data Quality ruleset that uses an IsUnique rule on the order_id column and configure the evaluation action to fail the workflow when the rule fails.
Turn on AWS Glue job bookmarks so previously processed rows are skipped and duplicates are automatically removed during the next job run.
Enable versioning on the S3 bucket so duplicate files are stored as separate object versions, ensuring the downstream load receives only the latest data.
Use AWS Database Migration Service with a full-load task followed by validation to compare S3 data with Redshift and detect any duplicate records before loading.
AWS Glue Data Quality can be invoked from a Glue workflow to run rulesets against a Data Catalog table before an ETL job proceeds. The IsUnique rule validates that all values in the specified column are distinct. If the rule fails and the evaluation action is set to 'Fail workflow', subsequent nodes are not executed, preventing the load of bad data into Redshift. S3 versioning only tracks object changes and does not check column-level uniqueness. Job bookmarks prevent re-processing of the same input files but do not detect duplicate primary keys within a file. AWS Database Migration Service validation is intended for database migrations, would require a separate service, and still would not stop the Glue workflow automatically.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is AWS Glue Data Quality?
Open an interactive chat with Bash
How does the IsUnique rule in AWS Glue Data Quality work?
Open an interactive chat with Bash
Why can't S3 versioning or Glue job bookmarks solve the duplicate order_id issue?
Open an interactive chat with Bash
What is AWS Glue Data Quality?
Open an interactive chat with Bash
How does the IsUnique rule work in AWS Glue Data Quality?
Open an interactive chat with Bash
What is the difference between S3 versioning and Glue job bookmarks?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Operations and Support
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .