AWS Certified Data Engineer Associate DEA-C01 Practice Question

An e-commerce company ingests daily order CSV files into Amazon S3. A Python AWS Glue job converts them to Parquet and loads Amazon Redshift. The team must ensure at least 98% of rows contain a non-null customer_email value and block the load if the threshold fails, while adding minimal new code within the Glue workflow. Which solution meets these requirements?

  • Invoke an Amazon Deequ validation script on an Amazon EMR cluster via AWS Step Functions before the Glue job; run the Glue workflow only if the script succeeds.

  • Run an AWS Glue DataBrew profile job after each file arrives, send the profile results to CloudWatch, and use a CloudWatch alarm to invoke an AWS Lambda function that stops the Glue job when the error rate exceeds 2%.

  • Add an AWS Glue Data Quality transform node to the existing job, define a ruleset that enforces 98% completeness on customer_email, and configure the job to fail when the rule is violated.

  • Schedule an Amazon Athena query with Amazon EventBridge to count rows with null customer_email values and publish an SNS alert so an operator can cancel the Glue workflow when necessary.

AWS Certified Data Engineer Associate DEA-C01
Data Operations and Support
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot