AWS Certified Data Engineer Associate DEA-C01 Practice Question
An e-commerce company ingests daily order CSV files into Amazon S3. A Python AWS Glue job converts them to Parquet and loads Amazon Redshift. The team must ensure at least 98% of rows contain a non-null customer_email value and block the load if the threshold fails, while adding minimal new code within the Glue workflow. Which solution meets these requirements?
Run an AWS Glue DataBrew profile job after each file arrives, send the profile results to CloudWatch, and use a CloudWatch alarm to invoke an AWS Lambda function that stops the Glue job when the error rate exceeds 2%.
Schedule an Amazon Athena query with Amazon EventBridge to count rows with null customer_email values and publish an SNS alert so an operator can cancel the Glue workflow when necessary.
Invoke an Amazon Deequ validation script on an Amazon EMR cluster via AWS Step Functions before the Glue job; run the Glue workflow only if the script succeeds.
Add an AWS Glue Data Quality transform node to the existing job, define a ruleset that enforces 98% completeness on customer_email, and configure the job to fail when the rule is violated.
AWS Glue provides built-in Data Quality (DQ) capabilities. A DQ transform node can be added to the existing Glue job and linked to a ruleset that asserts a Completeness rule on the customer_email column with a 98% threshold. By configuring the action to fail the job when the ruleset fails (Fail job without loading to target data), the load to Redshift is prevented-no additional Spark code or external orchestration is required.
Using EMR with Amazon Deequ requires writing and operating a separate Spark application plus Step Functions, which adds complexity. The Athena/EventBridge approach only alerts an operator and does not automatically stop the load. The DataBrew and CloudWatch method introduces multiple extra services and custom logic. Therefore, the Glue Data Quality transform node is the simplest integrated solution.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the AWS Glue Data Quality transform?
Open an interactive chat with Bash
What is the difference between Parquet and CSV formats?
Open an interactive chat with Bash
What is Amazon Deequ, and why wasn’t it chosen here?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Operations and Support
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .