AWS Certified Data Engineer Associate DEA-C01 Practice Question
An application writes 2 TB of structured transactional data as comma-separated files to an S3 bucket each day. Analysts query the data with Amazon Athena and experience long runtimes and high scan charges. A data engineer will add a nightly AWS Glue Spark job to transform the data. Which transformation will best address the volume characteristics while retaining the relational schema?
Compress the existing CSV files with Gzip and remove all header rows.
Convert the files to Apache Parquet, apply Snappy compression, and partition the dataset by transaction_date.
Split each CSV file into chunks no larger than 128 MB to increase Athena parallelism.
Merge all daily CSV files into a single uncompressed file to reduce S3 object overhead.
Columnar formats such as Apache Parquet store values together by column rather than by row, so Athena can read only the columns referenced in a query instead of every field in every record. Snappy compression further reduces the amount of data stored and scanned without adding excessive CPU overhead. Adding a partition key such as transaction_date lets Athena read only the partitions that match a predicate, which sharply limits the amount of data that must be scanned each day. Compressing CSV, combining files, or simply splitting them into smaller objects still forces Athena to read every column of every row, so they do not significantly reduce scan costs or latency for large structured datasets.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is Apache Parquet better than CSV for Athena queries?
Open an interactive chat with Bash
How does partitioning data in S3 improve Athena performance?
Open an interactive chat with Bash
What is Snappy compression, and why is it suitable for Parquet?
Open an interactive chat with Bash
Why is Apache Parquet better for Athena queries than CSV?
Open an interactive chat with Bash
What is Snappy compression and why is it used here?
Open an interactive chat with Bash
How does partitioning by transaction_date improve Athena query performance?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Ingestion and Transformation
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .