AWS Certified Data Engineer Associate DEA-C01 Practice Question

An e-commerce company lands daily CSV order files in Amazon S3. An AWS Glue Spark job loads the data into Amazon Redshift. Each record must contain a non-null customer_id and an order_total greater than 0. If more than 0.5 % of rows break either rule, the pipeline must halt and alert operations; otherwise loading continues. What is the most efficient way to add this validation with minimal new code?

  • Insert an AWS Glue Data Quality transform with a ruleset that stops the job when more than 0.5% of rows fail completeness and custom checks.

  • Create CHECK constraints on the Redshift target table so the COPY command rejects any rows with null customer_id or non-positive order_total.

  • Run an AWS Step Functions workflow that executes an Athena query after loading to count invalid rows and rolls back the transaction if the 0.5% limit is exceeded.

  • Add a DataBrew profile job to scan the CSV files before every Glue run and trigger the Glue job only if the profile shows fewer than 0.5% invalid rows.

AWS Certified Data Engineer Associate DEA-C01
Data Operations and Support
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot