AWS Certified Data Engineer Associate DEA-C01 Practice Question
An ecommerce company stores raw clickstream events in Amazon S3 as date-partitioned Parquet files cataloged by AWS Glue. A data engineer must automate a daily SQL aggregation that joins about 3 TB of data and appends the results to another S3 bucket for downstream dashboards. The solution must automatically register the new dataset in the Data Catalog and minimize cost and operational overhead. Which solution meets these requirements?
Use AWS Step Functions to launch an on-demand Amazon EMR cluster running Hive, execute the aggregation job, copy the results to the target bucket, and then terminate the cluster.
Create an AWS Glue Spark ETL job triggered by EventBridge to perform the aggregation and write the results to the target bucket; schedule a separate Glue crawler to update the Data Catalog.
Load the source Parquet data into Amazon Redshift Serverless, schedule a query to create the aggregated table, and use UNLOAD to export the results back to the target S3 bucket.
Configure an Amazon EventBridge rule to run a scheduled Amazon Athena CTAS query that writes partitioned Parquet output to the target S3 bucket and automatically creates the corresponding table in the AWS Glue Data Catalog.
Amazon Athena is serverless and charged only for the data scanned, making it the lowest-cost option when the data is already stored in partitioned Parquet on S3. A scheduled query, configured through Amazon EventBridge, can run a CREATE TABLE AS SELECT (CTAS) statement that writes the aggregated, partitioned Parquet output directly to the target S3 bucket. The CTAS operation automatically creates the new table and its partitions in the AWS Glue Data Catalog, so no additional crawler or cluster management is required.
Using an on-demand EMR cluster adds infrastructure to provision and pay for even when jobs are short-lived. A Glue Spark job is serverless but incurs higher DPU costs and still needs a separate crawler or code to register partitions. Loading the data into Redshift Serverless introduces an extra data-loading step and higher storage costs, and UNLOAD does not automatically maintain Glue catalog metadata. Therefore, the Athena scheduled CTAS query best meets cost, automation, and cataloging requirements.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
How does Amazon Athena enable cost-effective querying for S3-stored data?
Open an interactive chat with Bash
What is a CTAS query in Amazon Athena, and how does it work?
Open an interactive chat with Bash
What role does AWS Glue Data Catalog play in this solution?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Operations and Support
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .