AWS Certified Data Engineer Associate DEA-C01 Practice Question

A data engineering team runs a persistent Amazon EMR cluster that stores intermediate data in HDFS. Each night, about 50 TB of gzip log files arrive in an Amazon S3 bucket and must be copied into HDFS before downstream MapReduce jobs start. The transfer must maximize throughput, minimize S3 request costs, and run by using only the existing EMR cluster resources. Which solution meets these requirements?

  • Use AWS DataSync to transfer the objects to volumes on each core node, then import the data into HDFS.

  • Add an EMR step that uses S3DistCp to copy the objects from Amazon S3 to HDFS in parallel.

  • Mount the S3 bucket on every core node with s3fs and move the objects to HDFS with the Linux cp command.

  • From the master node, run the AWS CLI command "aws s3 cp --recursive" to copy the objects into HDFS.

AWS Certified Data Engineer Associate DEA-C01
Data Ingestion and Transformation
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot