AWS Certified Data Engineer Associate DEA-C01 Practice Question
An AWS Glue Spark job processes about 1 TB of JSON data stored as roughly 200 000 small 5-MB objects in a single S3 prefix. CloudWatch metrics show excessive shuffle and high driver memory usage, and the job currently runs for three hours. The data engineer must improve runtime without increasing compute cost. Which action follows AWS performance-tuning best practices?
Use S3DistCp to combine the small source files into larger objects and then write the job's output as partitioned Parquet files.
Increase the Glue job's DPU allocation from 10 to 20 to give executors more memory.
Reconfigure the job to run in streaming mode and enable continuous logging.
Create a CloudWatch alarm that stops the job if shuffle memory exceeds a threshold.
A large number of small 5-MB files forces Spark to create many partitions, increasing shuffle and driver overhead. Consolidating these files into fewer, larger objects reduces partition count and network traffic, allowing each task to process more data per read. Using S3DistCp (or a Glue compaction blueprint) to group the files is a low-cost preprocessing step and, together with writing the job's output in a columnar, partitioned format, typically yields the most significant performance gain. Simply allocating more DPUs may help but raises cost; switching to streaming mode or adding alarms does not address the shuffle bottleneck.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is S3DistCp and how does it help in consolidating files?
Open an interactive chat with Bash
Why are Parquet files recommended for AWS Glue jobs?
Open an interactive chat with Bash
What is shuffle in Apache Spark, and why does it cause bottlenecks?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Operations and Support
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .