AWS Certified Data Engineer Associate DEA-C01 Practice Question

Every 5 minutes compressed JSON clickstream files (~200 MB) land in an Amazon S3 prefix. You must add a session-duration column, convert to partitioned Parquet, and write back to S3 for Athena within 10 minutes of arrival. The team needs a fully managed, auto-scaling, pay-per-use solution with minimal ops. Which approach satisfies these requirements?

Launch an auto-scaling Amazon EMR cluster and schedule a Spark step every 5 minutes to process new files and write Parquet to S3.
Use Amazon EventBridge to trigger an Amazon EMR Serverless Spark job that reads the new file from S3, enriches and converts it to partitioned Parquet, and writes the result back to S3.
Set up an S3 event to invoke an AWS Lambda function that reads each file, performs the transformation in memory, and stores the Parquet output in S3.
Create an external table in Amazon Redshift Spectrum on the JSON data, run a CTAS query every 5 minutes to convert to Parquet, and store the output in S3.

AWS Certified Data Engineer Associate DEA-C01

Data Ingestion and Transformation

Your Score:

Bash, the Crucial Exams Chat Bot

AI Bot

AWS Certified Data Engineer Associate DEA-C01 Practice Question

Answer Description

Ask Bash

What is Amazon EMR Serverless and how does it differ from EMR clusters?

What is the benefit of using partitioned Parquet over other file formats in this solution?

How does Amazon EventBridge trigger the EMR Serverless Spark job efficiently?

Monthly

$19.99 $11.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99 $26.99

One time purchase of $26.99,
Does not auto-renew.

Annual Pass

$119.99 $71.99

One time purchase of $71.99,
Does not auto-renew.

Lifetime Pass

$189.99 $113.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

AWS Certified Data Engineer Associate DEA-C01 Practice Question

Report Issue

Answer Description

Ask Bash

What is Amazon EMR Serverless and how does it differ from EMR clusters?

What is the benefit of using partitioned Parquet over other file formats in this solution?

How does Amazon EventBridge trigger the EMR Serverless Spark job efficiently?

Report Issue