AWS Certified Data Engineer Associate DEA-C01 Practice Question

An e-commerce company ingests clickstream events into an Amazon Kinesis data stream. A Lambda function, configured as an event source mapping, converts each record from JSON to Parquet and immediately stores it in Amazon S3. After one day, thousands of sub-100 KB Parquet files exist, inflating Athena query costs. The team needs exactly one Parquet file per shard every 15 minutes while keeping the solution fully serverless and low-cost. Which approach meets these requirements?

  • Replace the event source mapping with an EventBridge rule that invokes the Lambda function every 15 minutes. In the function use GetShardIterator and GetRecords to read each shard, aggregate the data, write one Parquet file, and store the last sequence number in DynamoDB for checkpointing.

  • Increase the event source mapping batch size to the maximum and use S3 multipart upload so each invocation appends new data to the same object key.

  • Create an AWS Glue streaming ETL job that reads from the Kinesis stream and writes partitioned Parquet files to S3 every 15 minutes.

  • Send the stream to Kinesis Data Firehose with Parquet conversion enabled and a 15-minute buffering interval, eliminating the Lambda function.

AWS Certified Data Engineer Associate DEA-C01
Data Ingestion and Transformation
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot