AWS Certified Data Engineer Associate DEA-C01 Practice Question
A company stores six months of IoT sensor readings as GZIP-compressed CSV files in an Amazon S3 data lake. Business analysts use Amazon Athena to run ad-hoc queries several times per day and are concerned about high query latency and the cost of data scanned. Without changing the SQL that analysts run, which approach will most effectively reduce both latency and Athena query costs?
Re-encode the files as line-delimited JSON and keep them GZIP-compressed in S3.
Convert the CSV files to columnar Parquet format and compress them with Snappy before storing them in S3.
Load the data into an Amazon RDS database and access it through Athena federated queries.
Split the existing CSV files into 128 MB objects to increase parallelism when Athena reads them.
Athena charges based on the amount of data it reads from Amazon S3. Converting the CSV files to a columnar format such as Parquet organizes data by column and applies compression like Snappy, so Athena reads only the columns referenced in each query and far fewer bytes overall. This greatly lowers data-scanned charges and improves query performance without requiring analysts to modify their SQL. Re-encoding the data as JSON enlarges the files and increases scan cost. Merely splitting CSV files changes parallelism but not the volume of bytes read, so cost savings are minimal. Moving the data to Amazon RDS and using federated queries eliminates Athena scan charges but introduces a different service, higher storage cost, and schema changes-violating the requirement to keep the same query logic.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is Parquet preferred over CSV for Athena queries?
Open an interactive chat with Bash
How does Snappy compression improve query efficiency?
Open an interactive chat with Bash
What makes JSON unsuitable for Athena in this scenario?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Store Management
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .