AWS Certified Data Engineer Associate DEA-C01 Practice Question
A data engineering team receives daily application logs as tab-delimited .txt files in Amazon S3. About 50,000 small files totaling 200 GB uncompressed are added each day. Analysts use Amazon Athena to run ad-hoc queries that scan an entire month of logs, and costs are increasing. Without deploying new managed infrastructure or altering the source system, which action best reduces Athena scan cost and improves query performance?
Import the .txt files into Amazon RDS for PostgreSQL and point Athena federated queries at the database.
Use an AWS Glue ETL job to convert the .txt files to compressed Parquet, partition the data by day, and store the results in a separate S3 prefix.
Merge all .txt files into a single large GZIP file with Amazon S3 Batch Operations and query the compressed file directly with Athena.
Enable Athena query result reuse and instruct analysts to rerun saved queries instead of writing new ones.
Converting the tab-delimited .txt files to a columnar, compressed format such as Parquet and partitioning the data greatly reduces the amount of data Athena must read, lowering cost and improving speed. Athena charges per byte scanned; columnar storage lets Athena read only the required columns, and partitions let it skip entire objects when filtering by date. Simply merging the text files and applying GZIP still requires Athena to decompress and read every row. Loading the data into Amazon RDS introduces new infrastructure and does not address scan pricing for Athena. Relying on result reuse helps only if analysts run identical queries, which is not the case for frequent ad-hoc analysis.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Parquet, and why is it used in data processing?
Open an interactive chat with Bash
What are partitions in S3, and how do they help improve Athena queries?
Open an interactive chat with Bash
How does columnar storage reduce scan costs in Amazon Athena?
Open an interactive chat with Bash
Why does converting to Parquet reduce Athena scan cost?
Open an interactive chat with Bash
What is partitioning, and why does it matter for query performance?
Open an interactive chat with Bash
How does Athena handle compressed formats like Parquet versus GZIP?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Store Management
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .