AWS Certified Data Engineer Associate DEA-C01 Practice Question
An AWS data engineering team operates multiple Amazon EMR clusters that generate hundreds of gigabytes of Spark and YARN logs daily. Security analysts must retain these logs for 12 months and run ad-hoc SQL queries against them. The solution should minimize EMR cluster overhead and rely only on fully managed AWS services. Which architecture satisfies these requirements?
Install Filebeat on each EMR core node to ship logs to a self-managed Elasticsearch cluster on Amazon EC2; retain data for a year and analyze it with Kibana.
Enable EMRFS consistent view and write all logs directly to HDFS; periodically run Hive queries on the cluster to analyze the logs and copy results to Amazon S3 for retention.
Send EMR application and system logs to Amazon CloudWatch Logs, use a subscription filter to stream the logs through Amazon Kinesis Data Firehose into an Amazon S3 data lake bucket, catalog the data with AWS Glue, and query with Amazon Athena.
Store logs on local disks and use a cron job to transfer them to an on-premises Hadoop cluster for year-long retention and analysis with Presto.
Streaming the EMR application logs to Amazon CloudWatch Logs removes log processing from the clusters. A CloudWatch Logs subscription can forward the data to Amazon Kinesis Data Firehose, which automatically buffers, compresses, and delivers the logs to an Amazon S3 bucket. After the data is cataloged with AWS Glue, the analysts can issue ad-hoc SQL queries in Amazon Athena. All services used are fully managed, and storage in S3 meets the one-year retention requirement.
Keeping logs in HDFS and querying them on the running clusters increases cluster load and requires the clusters to stay online. Shipping logs to an on-premises Hadoop environment or running a self-managed Elasticsearch deployment on Amazon EC2 introduces additional infrastructure to provision and maintain, violating the requirement to rely only on managed AWS services.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Amazon Athena, and how does it query data stored in S3?
Open an interactive chat with Bash
How does AWS Glue help catalog data for querying?
Open an interactive chat with Bash
How does Amazon Kinesis Data Firehose handle log streaming?
Open an interactive chat with Bash
How does Amazon Kinesis Data Firehose buffer, compress, and deliver logs to Amazon S3?
Open an interactive chat with Bash
What is the role of AWS Glue in this architecture?
Open an interactive chat with Bash
Why is Amazon Athena ideal for running ad-hoc SQL queries on the log data?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Security and Governance
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .