AWS Certified Data Engineer Associate DEA-C01 Practice Question

You run an AWS Glue 3.0 Spark job written in Python that reads 50,000 gzip-compressed JSON files (about 100 KB each) from one Amazon S3 prefix, transforms the data, and writes Parquet files back to S3. The job uses the default 10 G.1X DPUs and currently completes in eight hours while average CPU utilization stays under 30 percent. Which modification will most improve performance without increasing cost?

  • Write the Parquet output with the Zstandard compression codec to shrink the file sizes.

  • Add --conf spark.executor.memory=16g to the job parameters to increase executor heap size.

  • Enable AWS Glue job bookmarking so previously processed files are skipped.

  • Use create_dynamic_frame_from_options with connection_options {"groupFiles": "inPartition", "groupSize": "134217728"} so Glue combines many small objects before processing.

AWS Certified Data Engineer Associate DEA-C01
Data Ingestion and Transformation
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot