AWS Certified Data Engineer Associate DEA-C01 Practice Question
Your company stores about 14 TB of CloudTrail JSON logs daily in an S3 bucket. Auditors need ad-hoc Athena queries on the last 90 days while minimizing storage and scan cost. The job must convert the logs to partitioned Parquet each night, update the Glue Data Catalog, respect existing Lake Formation table permissions, and avoid cluster management by scaling automatically. Which solution meets these requirements?
Enable CloudTrail Lake, import the S3 log bucket each night, and let auditors query the event data store while exporting results to Athena when necessary.
Trigger nightly Athena CREATE TABLE AS SELECT statements from an AWS Lambda function to convert the JSON logs to Parquet and add partitions with MSCK REPAIR TABLE.
Maintain a persistent EMR on EC2 cluster with a cron-based Spark step that converts and catalogs the logs and scales the cluster with EMR auto-scaling policies.
Schedule an Amazon EMR Serverless Spark job with EventBridge to convert the JSON logs to date-partitioned Parquet in S3, update the Glue Data Catalog, and rely on Lake Formation for table governance.
An Amazon EMR Serverless Spark application removes the need to provision or maintain clusters and scales workers up or down automatically. A nightly EventBridge trigger can launch a Spark job that reads the JSON objects in S3, writes date-partitioned Parquet back to S3 using the EMRFS S3-optimized committer, and issues catalog updates so the existing Glue table remains current. Because the job uses the Glue metastore, Lake Formation grants continue to protect the table, allowing auditors to query with Athena immediately. Athena CTAS driven by Lambda is limited by run-time and concurrency and repeatedly scans the full JSON set. A long-running EMR on-EC2 cluster still requires lifecycle management and costs even when idle. CloudTrail Lake is a separate service that is not queried through Athena and would duplicate storage, so it does not satisfy the stated constraints.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Amazon EMR Serverless?
Open an interactive chat with Bash
Why use Parquet instead of JSON for data storage in this scenario?
Open an interactive chat with Bash
How does Lake Formation integrate with Glue Data Catalog for governance?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Security and Governance
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .