AWS Certified Data Engineer Associate DEA-C01 Practice Question
An e-commerce company stores hourly Apache Parquet files in an Amazon S3 prefix. A partner accepts only CSV files with a header row. About 5 GB arrives daily. The data engineer needs an automated, cost-effective pipeline that converts every newly created Parquet object to CSV and writes it to another S3 prefix. The team prefers minimal custom code and automatic scaling. Which approach meets these requirements?
Configure an Amazon Kinesis Data Firehose delivery stream with the S3 bucket as the source and enable record format conversion from Parquet to CSV.
COPY the Parquet files into an Amazon Redshift staging table, then UNLOAD the data in CSV format to the target S3 prefix on a schedule managed by Amazon EventBridge.
Use Amazon S3 Event Notifications to invoke an AWS Lambda function that leverages the pandas library to download, convert, and upload each file in CSV format.
Enable EventBridge for the S3 bucket and add a rule that starts an AWS Glue Studio ETL job. The job loads the Parquet object as a DynamicFrame and writes it as CSV with headers to the target prefix.
An AWS Glue Studio ETL job is serverless, auto-scales, and generates most of the Spark code automatically. By enabling S3 event delivery to Amazon EventBridge and adding a rule that targets the Glue job, each new Parquet object triggers a run. The job reads the object as a DynamicFrame and writes a CSV file with a single header row to the target prefix. Glue charges only for job-run time and requires no cluster management.
Loading data into Amazon Redshift first introduces continuous cluster costs and extra data movement. Kinesis Data Firehose cannot ingest directly from an S3 bucket and its format conversion supports only JSON to Parquet/ORC, so it can't perform the required transformation. A Lambda function that uses pandas would have to download and process the full 5 GB file, pushing Lambda's memory and timeout limits and requiring more custom code than Glue Studio.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is AWS Glue Studio and how does it simplify ETL processes?
Open an interactive chat with Bash
What is the difference between AWS EventBridge and S3 Event Notifications?
Open an interactive chat with Bash
What is a DynamicFrame in AWS Glue, and why is it useful?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Ingestion and Transformation
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .