AWS Certified Data Engineer Associate DEA-C01 Practice Question

Your company receives hourly comma-separated value (CSV) log files in an Amazon S3 prefix. Data analysts use Amazon Athena for ad-hoc queries, but scan costs and runtimes are increasing as the dataset grows. As a data engineer, you must convert both existing and future files to an optimized columnar format, partition the data by event_date, and avoid managing any servers or long-running clusters.

Which solution MOST cost-effectively meets these requirements?

Modify the source application to write Parquet files directly to the target S3 prefix and drop the existing CSV files once verified.
Provision an Amazon EMR cluster with Apache Hive, run a CREATE EXTERNAL TABLE … STORED AS ORC statement to convert the CSV data to ORC, and keep the cluster running to process new hourly files.
Create an AWS Glue crawler to catalog the CSV files, then schedule an AWS Glue Spark job that reads the crawler's table, writes Snappy-compressed Parquet files partitioned by event_date to a new S3 prefix, and updates the Data Catalog.
Enable S3 Storage Lens and apply Lifecycle rules to transition the CSV objects to the S3 Glacier Flexible Retrieval storage class after 30 days to reduce storage and Athena scan costs.

AWS Certified Data Engineer Associate DEA-C01

Data Ingestion and Transformation

Your Score:

Bash, the Crucial Exams Chat Bot

AI Bot

AWS Certified Data Engineer Associate DEA-C01 Practice Question

Answer Description

Ask Bash

What is AWS Glue and how does it work?

What are the benefits of using Parquet files over CSV files in AWS Athena?

How does partitioning by event_date improve Athena performance?

Monthly

$19.99 $11.99

Billed monthly,
Cancel any time.

3 Month Pass

$44.99 $26.99

One time purchase of $26.99,
Does not auto-renew.

Annual Pass

$119.99 $71.99

One time purchase of $71.99,
Does not auto-renew.

Lifetime Pass

$189.99 $113.99

One time purchase,
Good for life.

All Exams

Unlimited Tests

Unlimited Questions

AI Tutor

Track scores

Report Cards

Voucher Discounts

Advanced PBQs

Included Exams

AWS Certified Data Engineer Associate DEA-C01 Practice Question

Report Issue

Answer Description

Ask Bash

What is AWS Glue and how does it work?

What are the benefits of using Parquet files over CSV files in AWS Athena?

How does partitioning by event_date improve Athena performance?

Report Issue