AWS Certified Data Engineer Associate DEA-C01 Practice Question
An e-commerce company ingests clickstream events into an Amazon Kinesis data stream. A Lambda function, configured as an event source mapping, converts each record from JSON to Parquet and immediately stores it in Amazon S3. After one day, thousands of sub-100 KB Parquet files exist, inflating Athena query costs. The team needs exactly one Parquet file per shard every 15 minutes while keeping the solution fully serverless and low-cost. Which approach meets these requirements?
Create an AWS Glue streaming ETL job that reads from the Kinesis stream and writes partitioned Parquet files to S3 every 15 minutes.
Send the stream to Kinesis Data Firehose with Parquet conversion enabled and a 15-minute buffering interval, eliminating the Lambda function.
Increase the event source mapping batch size to the maximum and use S3 multipart upload so each invocation appends new data to the same object key.
Replace the event source mapping with an EventBridge rule that invokes the Lambda function every 15 minutes. In the function use GetShardIterator and GetRecords to read each shard, aggregate the data, write one Parquet file, and store the last sequence number in DynamoDB for checkpointing.
Invoking the function on a fixed schedule decouples ingestion from transformation. A single, scheduled Lambda invocation can read all new records from each shard with GetShardIterator and GetRecords, aggregate them in-memory, then write one Parquet object. Persisting the last sequence number in DynamoDB lets each run resume where the previous run stopped. This produces the desired file size and cadence while remaining serverless and inexpensive. The other options either cannot append to the same S3 object, replace Lambda with a different service, or introduce a higher-cost Glue job.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is DynamoDB checkpointing and why is it used here?
Open an interactive chat with Bash
How do GetShardIterator and GetRecords work in Kinesis?
Open an interactive chat with Bash
Why is Parquet preferred over JSON for Athena queries?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Ingestion and Transformation
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .