AWS Certified Data Engineer Associate DEA-C01 Practice Question
An AWS Glue Spark job ingests click-stream data stored in Amazon S3 as Parquet files partitioned by the column event_date (YYYY-MM-DD). The job is run daily with a job parameter DATE=2025-10-01, but the code currently executes:
The team reports that the job scans several terabytes and exceeds its 15-minute SLA. Which change will MOST effectively reduce the job's runtime with minimal additional cost?
Insert df = df.repartition(1) immediately after the filter to minimize the number of output files.
Double the executor and driver memory in the Glue job's Spark configuration.
Change the read path to s3://analytics/clicks/event_date=${DATE}/ (or pass a push_down_predicate for event_date) so Spark loads only the matching partition.
Call df.cache() before all downstream transformations to keep the dataset in memory.
Reading only the required partition eliminates unnecessary I/O and shuffle operations. Supplying the partition value in the path (or the --push_down_predicate job argument) lets Glue's underlying Spark engine prune partitions before data is loaded, so the job touches only a few gigabytes. Merely caching, repartitioning, or increasing resources does not tackle the primary bottleneck-scanning unneeded data-so these actions give limited or negative performance gains while adding cost.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What are partitions in AWS Glue and why are they important?
Open an interactive chat with Bash
What is a push-down predicate and how does it work in AWS Glue?
Open an interactive chat with Bash
Why is filtering data at the source more efficient than after loading?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Ingestion and Transformation
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .