AWS Certified Data Engineer Associate DEA-C01 Practice Question

An AWS Glue Spark job processes about 1 TB of JSON data stored as roughly 200 000 small 5-MB objects in a single S3 prefix. CloudWatch metrics show excessive shuffle and high driver memory usage, and the job currently runs for three hours. The data engineer must improve runtime without increasing compute cost. Which action follows AWS performance-tuning best practices?

  • Use S3DistCp to combine the small source files into larger objects and then write the job's output as partitioned Parquet files.

  • Increase the Glue job's DPU allocation from 10 to 20 to give executors more memory.

  • Reconfigure the job to run in streaming mode and enable continuous logging.

  • Create a CloudWatch alarm that stops the job if shuffle memory exceeds a threshold.

AWS Certified Data Engineer Associate DEA-C01
Data Operations and Support
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot