AWS Certified Data Engineer Associate DEA-C01 Practice Question
An Amazon Athena table stores clickstream events as Parquet files in an S3 location partitioned by year, month, and day. A nightly ETL job currently runs the following query and is incurring high scan costs:
SELECT user_id, page, event_time FROM clickstream WHERE event_time BETWEEN date '2023-07-01' AND date '2023-07-31';
How should you rewrite the SQL to scan the least amount of data without changing the table definition?
Include an ORDER BY year, month, day clause to ensure the data is read in partition order.
Append a LIMIT clause so the statement becomes:
SELECT user_id, page, event_time FROM clickstream WHERE event_time BETWEEN date '2023-07-01' AND date '2023-07-31' LIMIT 100000;
Create a common table expression (CTE) that selects all columns and then filter the CTE on event_time within the main query.
Add a filter on the partition columns, for example:
SELECT user_id, page, event_time FROM clickstream WHERE year = 2023 AND month = 7 AND day BETWEEN 1 AND 31;
Athena partitions are stored as separate folders in Amazon S3. When a query's WHERE clause references the partition columns, Athena prunes the unrelated partitions and reads only the relevant files, which reduces the amount of data scanned and lowers cost. Filtering solely on event_time does not use partition pruning because that column is stored inside the files, not in the partition path. A LIMIT clause, ORDER BY, or a common table expression does not affect how much data is read from S3, so they provide no scan-cost benefit.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is partition pruning in Amazon Athena?
Open an interactive chat with Bash
Why doesn’t filtering on non-partitioned columns reduce scan costs?
Open an interactive chat with Bash
How does partitioning in Athena improve query performance?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Ingestion and Transformation
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .