AWS Certified Data Engineer Associate DEA-C01 Practice Question

A company lands 2 TB of comma-separated log files in an Amazon S3 landing prefix every night at 01:00. Analysts query the data with Amazon Athena and want the new records available within 30 minutes in a curated S3 prefix, stored as Apache Parquet and partitioned by ingestion date. The data engineering team wants the lowest operational overhead and to minimize compute costs when the nightly workload is idle. Which approach meets these requirements?

Spin up a long-running Amazon EMR cluster with Apache Spark. Schedule a daily step at 01:05 that converts the files to Parquet and writes them to the curated prefix, leaving the cluster running for the next day's job.
Configure an AWS Glue Spark job that is triggered when new files arrive. The job converts the CSV input to Parquet, partitions by date, writes to the curated S3 prefix, and uses Glue job auto-scaling so no compute is billed when idle.
Load the CSV data into an Amazon Redshift table each night, then run an UNLOAD command to write Parquet files partitioned by date back to S3 for Athena queries.
Invoke an AWS Lambda function from each S3 PUT event. The function uses pandas to read the CSV objects, convert them to Parquet, and store the results in the curated prefix.

AWS Certified Data Engineer Associate DEA-C01

Data Ingestion and Transformation

Your Score:

Bash, the Crucial Exams Chat Bot

AI Bot

AWS Certified Data Engineer Associate DEA-C01 Practice Question

Answer Description

Ask Bash

Why is AWS Glue preferred for this workflow over Amazon EMR?

What advantages does Apache Parquet offer over CSV in this scenario?

How does AWS Glue handle partitioning by ingestion date?

Monthly

$19.99 $11.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99 $26.99

One time purchase of $26.99,
Does not auto-renew.

Annual Pass

$119.99 $71.99

One time purchase of $71.99,
Does not auto-renew.

Lifetime Pass

$189.99 $113.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

AWS Certified Data Engineer Associate DEA-C01 Practice Question

Report Issue

Answer Description

Ask Bash

Why is AWS Glue preferred for this workflow over Amazon EMR?

What advantages does Apache Parquet offer over CSV in this scenario?

How does AWS Glue handle partitioning by ingestion date?

Report Issue