AWS Certified Data Engineer Associate DEA-C01 Practice Question

Your company stores about 14 TB of CloudTrail JSON logs daily in an S3 bucket. Auditors need ad-hoc Athena queries on the last 90 days while minimizing storage and scan cost. The job must convert the logs to partitioned Parquet each night, update the Glue Data Catalog, respect existing Lake Formation table permissions, and avoid cluster management by scaling automatically. Which solution meets these requirements?

  • Enable CloudTrail Lake, import the S3 log bucket each night, and let auditors query the event data store while exporting results to Athena when necessary.

  • Maintain a persistent EMR on EC2 cluster with a cron-based Spark step that converts and catalogs the logs and scales the cluster with EMR auto-scaling policies.

  • Schedule an Amazon EMR Serverless Spark job with EventBridge to convert the JSON logs to date-partitioned Parquet in S3, update the Glue Data Catalog, and rely on Lake Formation for table governance.

  • Trigger nightly Athena CREATE TABLE AS SELECT statements from an AWS Lambda function to convert the JSON logs to Parquet and add partitions with MSCK REPAIR TABLE.

AWS Certified Data Engineer Associate DEA-C01
Data Security and Governance
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot