AWS Certified Data Engineer Associate DEA-C01 Practice Question

A company ingests web clickstream logs into an S3 data lake. Analysts query the data with Amazon Athena. Queries typically target the most recent seven days but occasionally scan months of historical data. The data volume is about 5 TB per day. What is the most cost-effective way to organize the data in S3 to minimize Athena query runtimes and scan costs?

  • Store the logs as GZIP-compressed CSV files in a single prefix without partitions.

  • Convert the logs to ORC format but leave compression disabled to maximize read speed.

  • Convert the logs to columnar Parquet files, compress them, and partition the S3 prefix by event date (year/month/day).

  • Partition the logs by user ID and keep them as uncompressed JSON lines.

AWS Certified Data Engineer Associate DEA-C01
Data Store Management
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot