AWS Certified Data Engineer Associate DEA-C01 Practice Question
You run an AWS Glue 3.0 Spark job written in Python that reads 50,000 gzip-compressed JSON files (about 100 KB each) from one Amazon S3 prefix, transforms the data, and writes Parquet files back to S3. The job uses the default 10 G.1X DPUs and currently completes in eight hours while average CPU utilization stays under 30 percent. Which modification will most improve performance without increasing cost?
Write the Parquet output with the Zstandard compression codec to shrink the file sizes.
Add --conf spark.executor.memory=16g to the job parameters to increase executor heap size.
Enable AWS Glue job bookmarking so previously processed files are skipped.
Use create_dynamic_frame_from_options with connection_options {"groupFiles": "inPartition", "groupSize": "134217728"} so Glue combines many small objects before processing.
When a Spark job must open and schedule tens of thousands of very small objects, task-startup overhead, network calls, and driver pressure dominate run time even though CPU usage is low. AWS Glue lets you reduce that overhead by grouping files as they are read. Setting the S3 connection options "groupFiles" to "inPartition" and specifying an appropriate "groupSize" causes the Glue library to combine many small objects into larger logical partitions before they reach the executors, decreasing the number of tasks that must be scheduled and allowing each task to perform more useful work. Because this change does not request additional DPUs, cost remains the same.
Increasing executor memory does not attack the scheduling overhead that is the primary bottleneck, and Glue fixes executor memory per DPU.
Changing the Parquet compression codec affects the write phase, not the excessive read-side task creation.
Job bookmarking only helps skip files that were processed in earlier runs; it does not speed up processing of the current data set.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the purpose of grouping files in AWS Glue?
Open an interactive chat with Bash
What is the difference between DPUs and the groupFiles option?
Open an interactive chat with Bash
Why doesn't increasing Spark executor memory improve performance in this case?
Open an interactive chat with Bash
What is a dynamic frame in AWS Glue?
Open an interactive chat with Bash
What is the purpose of the 'groupFiles' and 'groupSize' connection options?
Open an interactive chat with Bash
Why does increasing executor memory not improve performance in this scenario?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Ingestion and Transformation
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .