AWS Certified Data Engineer Associate DEA-C01 Practice Question
An analytics team runs a Spark-based Amazon EMR cluster every night to transform 50 TB of web logs. The data is transient and can be regenerated at any time. The current long-running cluster keeps 10 TB of HDFS data on EBS volumes attached to core nodes, leading to high monthly storage charges. The team wants to keep job runtimes similar while minimizing storage cost. Which solution best meets these requirements?
Launch a transient EMR cluster that uses instance store-backed EC2 nodes (for example, r5d or i3) for HDFS and terminate the cluster after the nightly job completes.
Retain the long-running cluster but remove all local storage and access the data only through EMRFS consistent view backed by Amazon S3.
Keep the existing cluster and replace the current EBS volumes with lower-cost sc1 EBS volumes for HDFS.
Replace the Amazon EMR workload with AWS Glue Spark jobs that read from and write to Amazon S3.
Using EC2 instance store volumes provides high-throughput, low-latency storage that is included in the hourly price of the instance, so no separate EBS charges are incurred. Because the data is temporary, the loss of instance store when the cluster is terminated is not an issue. Creating a transient cluster each night ensures that storage exists only for the duration of the job, eliminating the recurring EBS cost while maintaining similar performance.
Storing data exclusively in Amazon S3 through EMRFS removes EBS cost but adds network latency and S3 request overhead, which can lengthen Spark shuffle phases. Re-platforming to AWS Glue might reduce management overhead but can be more expensive for very large, shuffle-heavy workloads and would require significant refactoring. Switching to low-cost sc1 EBS volumes lowers price but also drastically reduces throughput, likely increasing job runtimes.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the difference between instance store and EBS in Amazon EC2?
Open an interactive chat with Bash
Why is a transient EMR cluster more cost-effective for this use case?
Open an interactive chat with Bash
What are the trade-offs of using EMRFS with Amazon S3 instead of HDFS on instance store?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Store Management
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .