AWS Certified Data Engineer Associate DEA-C01 Practice Question
An e-commerce company stores daily transaction CSV files in Amazon S3. The downstream ML pipeline fails whenever numeric columns contain null or non-numeric values. You need an automated, low-code solution that validates and cleans each new file, stores the corrected data in a curated S3 prefix, and provides a summary of invalid records. Which approach requires the least operational effort?
Invoke an Amazon Athena CTAS query from an AWS Lambda function each day to select only valid rows into a new table stored in a different S3 prefix and publish results to Amazon SNS.
Build a custom Docker image that uses pandas to clean the files and run it daily with AWS Batch, writing logs of invalid rows to Amazon CloudWatch Logs.
Spin up an Amazon EMR cluster running Apache Spark, develop a PySpark script to validate and cleanse the dataset, and schedule the job with AWS Step Functions.
Create an AWS Glue DataBrew project that applies data quality rules, schedule a recipe job to output cleaned data to a curated S3 prefix, and rely on the job run metrics for the invalid-row summary.
AWS Glue DataBrew is a serverless, visual data-preparation service. You can create a project that defines data quality rules (for example, "column is not null" and "column is numeric"), build a recipe to fix or drop invalid rows, and schedule a job. The job writes the cleansed dataset to another S3 prefix and publishes run metrics and statistics to CloudWatch, providing the requested summary. Athena with Lambda, EMR with Spark, and AWS Batch all require custom code and additional infrastructure to build the same validation logic, so they involve more operational overhead than DataBrew.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What are data quality rules in AWS Glue DataBrew?
Open an interactive chat with Bash
How do DataBrew recipes work in cleaning datasets?
Open an interactive chat with Bash
What are AWS Glue DataBrew job run metrics, and how are they used?
Open an interactive chat with Bash
What is AWS Glue DataBrew, and how does it work?
Open an interactive chat with Bash
What are recipe jobs in AWS Glue DataBrew, and what do they do?
Open an interactive chat with Bash
How do AWS Glue DataBrew's run metrics help in monitoring data quality?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Operations and Support
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .