AWS Certified Data Engineer Associate DEA-C01 Practice Question
An analytics team receives new CSV files each night in the s3://sales/raw/ prefix. They must run a PySpark script that cleanses the data and writes partitioned Parquet files to s3://sales/curated/. The solution must 1) run automatically on a fixed schedule, 2) avoid reprocessing files that have already been converted, and 3) minimize infrastructure management. Which approach meets these requirements?
Use an Amazon EMR cluster launched each night by an AWS Lambda function, run the PySpark script as a step, then terminate the cluster.
Create an Amazon Athena scheduled query that uses CTAS to read the CSV files and write Parquet output to the curated prefix.
Load the CSV files into Amazon Redshift with the COPY command and run SQL transformations through the Redshift Scheduler.
Configure an AWS Glue Spark job that reads the raw prefix, enable job bookmarks, and invoke the job nightly with an Amazon EventBridge cron schedule.
An AWS Glue Spark job provides a fully managed serverless environment for running PySpark code, so the team does not need to provision or maintain clusters. Job bookmarks track previously processed input files, preventing the job from re-reading data that was already converted. An Amazon EventBridge cron rule can invoke the Glue job on the required nightly schedule, meeting the automation requirement without additional components.
Athena scheduled queries cannot execute PySpark and do not offer built-in state tracking comparable to bookmarks. Launching and terminating an EMR cluster each night adds significant operational overhead compared with Glue's serverless model. Loading the files into Redshift and transforming with SQL removes the workload from S3, does not use PySpark, and introduces unnecessary administration of a data warehouse. Therefore, using AWS Glue with job bookmarks and EventBridge is the most cost-effective, low-maintenance solution that satisfies all three requirements.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What are AWS Glue job bookmarks?
Open an interactive chat with Bash
How does Amazon EventBridge manage task scheduling?
Open an interactive chat with Bash
Why is AWS Glue preferred over Amazon EMR for this use case?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Operations and Support
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .