AWS Certified Data Engineer Associate DEA-C01 Practice Question

An analytics team receives new CSV files each night in the s3://sales/raw/ prefix. They must run a PySpark script that cleanses the data and writes partitioned Parquet files to s3://sales/curated/. The solution must 1) run automatically on a fixed schedule, 2) avoid reprocessing files that have already been converted, and 3) minimize infrastructure management. Which approach meets these requirements?

  • Use an Amazon EMR cluster launched each night by an AWS Lambda function, run the PySpark script as a step, then terminate the cluster.

  • Configure an AWS Glue Spark job that reads the raw prefix, enable job bookmarks, and invoke the job nightly with an Amazon EventBridge cron schedule.

  • Create an Amazon Athena scheduled query that uses CTAS to read the CSV files and write Parquet output to the curated prefix.

  • Load the CSV files into Amazon Redshift with the COPY command and run SQL transformations through the Redshift Scheduler.

AWS Certified Data Engineer Associate DEA-C01
Data Operations and Support
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot