AWS Certified Data Engineer Associate DEA-C01 Practice Question
A media company performs a weekly Spark ETL job on 40 TB of log files stored in Amazon S3. Intermediate shuffle data is needed only while the job runs; after the job finishes, the cluster is terminated. The team wants to minimize storage costs yet maintain high I/O throughput for the shuffle phase. Which solution meets these requirements?
Copy the data into an Amazon EFS file system and run the ETL using Spark containers on AWS Fargate.
Launch an Amazon EMR cluster that uses EC2 instances with locally attached NVMe instance store volumes as HDFS storage, and avoid provisioning additional EBS volumes.
Create an Amazon EMR cluster and attach gp3 EBS volumes sized to store the entire 40 TB dataset; enable EMRFS consistent view for metadata operations.
Load the data into an Amazon Redshift RA3 cluster with managed storage and perform the transformation using Redshift Spectrum.
Locally attached instance store volumes are included in the price of the EC2 instances that make up the EMR cluster, so no additional storage cost is incurred. Because instance store is directly attached to the host, it delivers the lowest-latency, highest-throughput storage available to the cluster, which is ideal for temporary Spark shuffle files. EBS, EFS, or Redshift all introduce extra per-GB charges and additional network hops, making them more expensive and potentially slower for this workload. Therefore, configuring the EMR cluster to use only instance store for HDFS provides the required performance at the lowest cost.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is an instance store volume in AWS?
Open an interactive chat with Bash
Why is HDFS on instance store volumes preferred for Spark shuffle operations?
Open an interactive chat with Bash
How does Amazon EMR use instance store volumes for HDFS?
Open an interactive chat with Bash
What is the difference between instance store volumes and EBS volumes in Amazon EC2?
Open an interactive chat with Bash
What is Spark shuffle, and why is high I/O throughput important for it?
Open an interactive chat with Bash
Why is Amazon EMR configured with locally attached instance store preferred over alternatives like EBS, Redshift, or EFS for this workload?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Store Management
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .