AWS Certified Data Engineer Associate DEA-C01 Practice Question

An analytics team runs a Spark-based Amazon EMR cluster every night to transform 50 TB of web logs. The data is transient and can be regenerated at any time. The current long-running cluster keeps 10 TB of HDFS data on EBS volumes attached to core nodes, leading to high monthly storage charges. The team wants to keep job runtimes similar while minimizing storage cost. Which solution best meets these requirements?

  • Launch a transient EMR cluster that uses instance store-backed EC2 nodes (for example, r5d or i3) for HDFS and terminate the cluster after the nightly job completes.

  • Retain the long-running cluster but remove all local storage and access the data only through EMRFS consistent view backed by Amazon S3.

  • Keep the existing cluster and replace the current EBS volumes with lower-cost sc1 EBS volumes for HDFS.

  • Replace the Amazon EMR workload with AWS Glue Spark jobs that read from and write to Amazon S3.

AWS Certified Data Engineer Associate DEA-C01
Data Store Management
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot