AWS Certified Data Engineer Associate DEA-C01 Practice Question
A company stores CSV files of about 50 MB each in an Amazon S3 landing bucket every 5 minutes. A data engineer must automatically convert each file to Parquet, add an ingestion_timestamp column, and write the result to a separate S3 bucket organized by date and hour. The solution must remain fully serverless, minimize cost, and require little operational management. Which approach meets these requirements?
Configure an S3 event notification that invokes an AWS Lambda function using the AWS SDK for pandas to read the CSV object, add the timestamp column, and write a Parquet file to the destination bucket's date/hour prefix.
Use a Lambda function to start an on-demand Amazon EMR cluster that runs a conversion script and terminates the cluster after processing each batch of files.
Trigger an AWS Step Functions state machine from the S3 event that starts an AWS Glue Spark job to perform the conversion and load the result to the destination bucket.
Send the files to an Amazon Kinesis Data Firehose delivery stream that uses a Lambda transformation to convert records to Parquet before writing to the target bucket.
Invoking an AWS Lambda function from an S3 event keeps the entire workflow serverless and removes the need to manage additional orchestration services. A 50 MB object easily fits within Lambda's 10 GB memory and 15-minute runtime limits. Using the AWS SDK for pandas (AWS Data Wrangler) inside the function lets the engineer read the CSV, add the timestamp column, and write the result directly to Parquet in the destination bucket and partition path. The other options either introduce unnecessary services (Kinesis Data Firehose or Step Functions plus AWS Glue) or require managing infrastructure (an on-demand EMR cluster), increasing cost and operational overhead.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the AWS SDK for pandas (AWS Data Wrangler) and how does it help in handling data conversions?
Open an interactive chat with Bash
Why is a serverless solution like AWS Lambda preferred in this scenario?
Open an interactive chat with Bash
What are the advantages of converting data to Parquet format in this use case?
Open an interactive chat with Bash
What is the AWS SDK for pandas (AWS Data Wrangler)?
Open an interactive chat with Bash
What is an S3 event notification, and how does it work?
Open an interactive chat with Bash
What are the benefits of using AWS Lambda for this workflow?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Operations and Support
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .