AWS Certified Data Engineer Associate DEA-C01 Practice Question

A media company performs a weekly Spark ETL job on 40 TB of log files stored in Amazon S3. Intermediate shuffle data is needed only while the job runs; after the job finishes, the cluster is terminated. The team wants to minimize storage costs yet maintain high I/O throughput for the shuffle phase. Which solution meets these requirements?

  • Copy the data into an Amazon EFS file system and run the ETL using Spark containers on AWS Fargate.

  • Create an Amazon EMR cluster and attach gp3 EBS volumes sized to store the entire 40 TB dataset; enable EMRFS consistent view for metadata operations.

  • Load the data into an Amazon Redshift RA3 cluster with managed storage and perform the transformation using Redshift Spectrum.

  • Launch an Amazon EMR cluster that uses EC2 instances with locally attached NVMe instance store volumes as HDFS storage, and avoid provisioning additional EBS volumes.

AWS Certified Data Engineer Associate DEA-C01
Data Store Management
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot