AWS Certified Data Engineer Associate DEA-C01 Practice Question
An e-commerce company stores 5 TB of click-stream data in Amazon S3 as CSV files (about 120 columns). Analysts query the data with Amazon Athena, typically selecting only 3-5 columns. Queries take several minutes, and the monthly Athena charge for data scanned is rising rapidly. As the data engineer, which change will most effectively reduce both query latency and Athena cost while letting analysts keep their existing SQL?
Enable server-side GZIP compression on the existing CSV files by using S3 Storage Lens.
Load the data into Amazon RDS for PostgreSQL and create column indexes on the frequently queried columns.
Compact the CSV files into larger files and enable S3 Transfer Acceleration for the bucket.
Convert the CSV files to Snappy-compressed Parquet and store them in the same S3 bucket.
Athena pricing is based on the number of bytes it scans. Converting the dataset to a columnar format such as Parquet and applying compression (for example, Snappy) greatly reduces the volume of data that must be read because Athena scans only the referenced columns and compressed blocks. This lowers cost and improves performance without requiring any change to the analysts' SQL. GZIP-compressing CSV files still requires Athena to read entire rows, moving the data to RDS adds unnecessary cost and scaling limits, and S3 Transfer Acceleration does not affect Athena scan size or latency.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Parquet and why is it better for Athena queries?
Open an interactive chat with Bash
How does Snappy compression improve Athena query performance?
Open an interactive chat with Bash
Why doesn’t GZIP compression on CSV files work as well for Athena?
Open an interactive chat with Bash
What is Parquet, and why is it better suited for Athena queries compared to CSV?
Open an interactive chat with Bash
How does Snappy compression improve query performance compared to GZIP compression?
Open an interactive chat with Bash
Why does Athena scan fewer bytes when using Parquet compared to CSV?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Store Management
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .