AWS Certified Data Engineer Associate DEA-C01 Practice Question

Every 5 minutes compressed JSON clickstream files (~200 MB) land in an Amazon S3 prefix. You must add a session-duration column, convert to partitioned Parquet, and write back to S3 for Athena within 10 minutes of arrival. The team needs a fully managed, auto-scaling, pay-per-use solution with minimal ops. Which approach satisfies these requirements?

  • Launch an auto-scaling Amazon EMR cluster and schedule a Spark step every 5 minutes to process new files and write Parquet to S3.

  • Set up an S3 event to invoke an AWS Lambda function that reads each file, performs the transformation in memory, and stores the Parquet output in S3.

  • Create an external table in Amazon Redshift Spectrum on the JSON data, run a CTAS query every 5 minutes to convert to Parquet, and store the output in S3.

  • Use Amazon EventBridge to trigger an Amazon EMR Serverless Spark job that reads the new file from S3, enriches and converts it to partitioned Parquet, and writes the result back to S3.

AWS Certified Data Engineer Associate DEA-C01
Data Ingestion and Transformation
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot