AWS Certified Data Engineer Associate DEA-C01 Practice Question

An AWS data engineering team operates multiple Amazon EMR clusters that generate hundreds of gigabytes of Spark and YARN logs daily. Security analysts must retain these logs for 12 months and run ad-hoc SQL queries against them. The solution should minimize EMR cluster overhead and rely only on fully managed AWS services. Which architecture satisfies these requirements?

  • Install Filebeat on each EMR core node to ship logs to a self-managed Elasticsearch cluster on Amazon EC2; retain data for a year and analyze it with Kibana.

  • Enable EMRFS consistent view and write all logs directly to HDFS; periodically run Hive queries on the cluster to analyze the logs and copy results to Amazon S3 for retention.

  • Send EMR application and system logs to Amazon CloudWatch Logs, use a subscription filter to stream the logs through Amazon Kinesis Data Firehose into an Amazon S3 data lake bucket, catalog the data with AWS Glue, and query with Amazon Athena.

  • Store logs on local disks and use a cron job to transfer them to an on-premises Hadoop cluster for year-long retention and analysis with Presto.

AWS Certified Data Engineer Associate DEA-C01
Data Security and Governance
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot