AWS Certified Data Engineer Associate DEA-C01 Practice Question
Every 5 minutes compressed JSON clickstream files (~200 MB) land in an Amazon S3 prefix. You must add a session-duration column, convert to partitioned Parquet, and write back to S3 for Athena within 10 minutes of arrival. The team needs a fully managed, auto-scaling, pay-per-use solution with minimal ops. Which approach satisfies these requirements?
Launch an auto-scaling Amazon EMR cluster and schedule a Spark step every 5 minutes to process new files and write Parquet to S3.
Set up an S3 event to invoke an AWS Lambda function that reads each file, performs the transformation in memory, and stores the Parquet output in S3.
Create an external table in Amazon Redshift Spectrum on the JSON data, run a CTAS query every 5 minutes to convert to Parquet, and store the output in S3.
Use Amazon EventBridge to trigger an Amazon EMR Serverless Spark job that reads the new file from S3, enriches and converts it to partitioned Parquet, and writes the result back to S3.
Amazon EMR Serverless provides a serverless Spark runtime that automatically provisions and scales workers on demand. A rule in Amazon EventBridge (or another orchestrator) can invoke the Spark job when a new object is created in S3. The job enriches the data, converts it to partitioned Parquet, and writes the result back to S3 well inside the 10-minute SLA. You pay only for the vCPU-seconds and GB-seconds used and manage no clusters.
An auto-scaling EMR cluster still requires provisioning and incurs EC2 charges while idle. A Lambda function is limited to 15 minutes and 10 GB memory, making 200 MB Spark-style processing risky. Redshift Spectrum can query JSON in S3, but materializing transformed Parquet would require CTAS/UNLOAD jobs that run inside a provisioned Redshift cluster, adding latency and continuous cost.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Amazon EMR Serverless and how does it differ from EMR clusters?
Open an interactive chat with Bash
What is the benefit of using partitioned Parquet over other file formats in this solution?
Open an interactive chat with Bash
How does Amazon EventBridge trigger the EMR Serverless Spark job efficiently?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Ingestion and Transformation
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .