AWS Certified Data Engineer Associate DEA-C01 Practice Question

An e-commerce company stores daily transaction CSV files in Amazon S3. The downstream ML pipeline fails whenever numeric columns contain null or non-numeric values. You need an automated, low-code solution that validates and cleans each new file, stores the corrected data in a curated S3 prefix, and provides a summary of invalid records. Which approach requires the least operational effort?

  • Invoke an Amazon Athena CTAS query from an AWS Lambda function each day to select only valid rows into a new table stored in a different S3 prefix and publish results to Amazon SNS.

  • Build a custom Docker image that uses pandas to clean the files and run it daily with AWS Batch, writing logs of invalid rows to Amazon CloudWatch Logs.

  • Spin up an Amazon EMR cluster running Apache Spark, develop a PySpark script to validate and cleanse the dataset, and schedule the job with AWS Step Functions.

  • Create an AWS Glue DataBrew project that applies data quality rules, schedule a recipe job to output cleaned data to a curated S3 prefix, and rely on the job run metrics for the invalid-row summary.

AWS Certified Data Engineer Associate DEA-C01
Data Operations and Support
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot