AWS Certified Data Engineer Associate DEA-C01 Practice Question
A data engineering team is building a Python AWS Lambda function that processes 50,000 records per second from an Amazon Kinesis data stream. The function must detect whether an incoming eventId has already been processed within the last 24 hours and discard duplicates. The function runs for up to 15 minutes and is limited to 1 GB of memory. Which solution is most appropriate?
Write each eventId to an Amazon S3 bucket and use S3 Select to query for existence before processing.
Push each eventId to an Amazon ElastiCache for Redis cluster with a 24-hour TTL and query Redis before processing.
Store each eventId in an in-memory Python set; duplicates are detected via O(1) membership tests.
Maintain an in-memory Bloom filter sized for 200 million items and rotate the filter every 24 hours.
A Bloom filter is a probabilistic set-membership data structure that provides constant-time lookups while using far less memory than a hash table. With a 1 % false-positive rate, a Bloom filter that tracks 200 million eventIds requires roughly 240 MB of memory-well within the 1 GB Lambda limit-and delivers the O(1) performance needed to keep up with 50,000 events per second. A Python set would consume several gigabytes, exceeding the memory allocation. S3 Select queries introduce network round-trips and scan costs that cannot keep pace with the required throughput. Redis can store the data, but every lookup becomes a network call, adding latency and increasing cost compared with an in-process data structure. Therefore, an in-memory Bloom filter that is rotated every 24 hours is the most cost-effective and performant choice.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a Bloom filter?
Open an interactive chat with Bash
Why is Redis not ideal for this scenario?
Open an interactive chat with Bash
How does rotating a Bloom filter work in this setup?
Open an interactive chat with Bash
AWS Certified Data Engineer Associate DEA-C01
Data Ingestion and Transformation
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
Pass with Confidence.
IT & Cybersecurity Package
You have hit the limits of our free tier, become a Premium Member today for unlimited access.
Military, Healthcare worker, Gov. employee or Teacher? See if you qualify for a Community Discount.
Monthly
$19.99
$19.99/mo
Billed monthly, Cancel any time.
3 Month Pass
$44.99
$14.99/mo
One time purchase of $44.99, Does not auto-renew.
MOST POPULAR
Annual Pass
$119.99
$9.99/mo
One time purchase of $119.99, Does not auto-renew.
BEST DEAL
Lifetime Pass
$189.99
One time purchase, Good for life.
What You Get
All IT & Cybersecurity Package plans include the following perks and exams .