AWS Certified Data Engineer Associate DEA-C01 Practice Question
A company receives a 500 MB CSV file from a partner each night. The file must land in Amazon S3, then be converted to Parquet and stored in a different S3 bucket. The data team wants a fully managed solution that runs at 02:00 UTC every day and sends failure notifications. Which approach meets these requirements with the least operational effort?
Use AWS Data Pipeline to spin up an Amazon EMR cluster on a daily schedule, run a Spark job that converts the file, and terminate the cluster when complete.
Run a cron job on an Amazon EC2 instance that downloads the file, converts it locally with a Python script, uploads the Parquet file to the destination bucket, and sends CloudWatch alarm emails on failure.
Create an AWS Transfer Family SFTP endpoint that writes to an S3 landing prefix. Configure an Amazon EventBridge rule to trigger an AWS Glue job at 02:00 UTC to convert the CSV to Parquet and publish errors to an Amazon SNS topic.
Configure an S3 event notification on the landing bucket to invoke an AWS Lambda function that reads the CSV object, converts it to Parquet, and writes the result to another bucket.
Using AWS Transfer Family to present an SFTP endpoint places the incoming file directly into Amazon S3 without the need to manage servers. A scheduled Amazon EventBridge rule can invoke an AWS Glue ETL job at 02:00 UTC to read the CSV object, convert it to Parquet, and write it to the target bucket. The Glue job can be configured with retry settings and integrated with Amazon SNS for error notifications, giving a fully managed, serverless, and schedulable pipeline. AWS Data Pipeline and an Amazon EMR cluster require cluster lifecycle management. An S3 event-triggered Lambda function is not a scheduled batch operation and may hit runtime and /tmp storage limits when processing a 500 MB file. An EC2-based cron job provides full control but adds the most operational overhead.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is AWS Transfer Family?
Open an interactive chat with Bash
How does EventBridge schedule tasks for automation?
Open an interactive chat with Bash
Why is Parquet preferred over CSV for data storage in analytics?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Ingestion and Transformation
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .