AWS Certified Data Engineer Associate DEA-C01 Practice Question

A data engineering team is building a Python AWS Lambda function that processes 50,000 records per second from an Amazon Kinesis data stream. The function must detect whether an incoming eventId has already been processed within the last 24 hours and discard duplicates. The function runs for up to 15 minutes and is limited to 1 GB of memory. Which solution is most appropriate?

  • Write each eventId to an Amazon S3 bucket and use S3 Select to query for existence before processing.

  • Push each eventId to an Amazon ElastiCache for Redis cluster with a 24-hour TTL and query Redis before processing.

  • Store each eventId in an in-memory Python set; duplicates are detected via O(1) membership tests.

  • Maintain an in-memory Bloom filter sized for 200 million items and rotate the filter every 24 hours.

AWS Certified Data Engineer Associate DEA-C01
Data Ingestion and Transformation
Your Score:
Settings & Objectives
Random Mixed
Questions are selected randomly from all chosen topics, with a preference for those you haven’t seen before. You may see several questions from the same objective or domain in a row.
Rotate by Objective
Questions cycle through each objective or domain in turn, helping you avoid long streaks of questions from the same area. You may see some repeat questions, but the distribution will be more balanced across topics.

Check or uncheck an objective to set which questions you will receive.

Bash, the Crucial Exams Chat Bot
AI Bot