AWS Certified Solutions Architect Professional SAP-C02 Practice Question
A solutions architect is designing a large multi-tenant SaaS application on AWS. The application uses a fleet of EC2 instances in an Auto Scaling group to process asynchronous jobs from an Amazon SQS queue. A single job from one tenant, known as a 'poison pill', could potentially cause a worker instance to crash repeatedly. This could lead to a rapid succession of instance terminations and launches, consuming resources and impacting the job processing capability for all tenants sharing the fleet. The architect needs to design a solution that minimizes the blast radius of such a failure, ensuring a problem caused by a single tenant affects the fewest other tenants possible. Which approach provides the most effective failure isolation for this scenario?
Implement a strict bulkhead pattern by provisioning a dedicated Auto Scaling group and SQS queue for each tenant.
Configure the Auto Scaling group to span multiple Availability Zones and place an Application Load Balancer in front of the EC2 instances to distribute jobs.
Configure a dead-letter queue (DLQ) on the main SQS queue to automatically isolate messages that fail processing multiple times.
Implement shuffle sharding by creating multiple target groups (virtual shards) from the total worker fleet and mapping each tenant to a unique combination of target groups.
The correct answer is to implement shuffle sharding. Shuffle sharding is an advanced architectural pattern that provides a high degree of workload isolation and blast radius reduction. It works by creating many virtual shards from a smaller pool of resources (the worker fleet) and assigning each tenant to a unique combination of these resources. In the event of a poison pill taking down the resources in one virtual shard, only the very small number of tenants assigned to that specific combination are affected. This massively reduces the blast radius compared to traditional sharding.
Implementing a separate Auto Scaling group for each tenant is an example of a bulkhead pattern, but it is not practical or cost-effective for a large-scale, multi-tenant application with thousands of tenants due to the high operational overhead and resource underutilization.
Using a standard Multi-AZ Auto Scaling group with an Application Load Balancer is a fundamental high-availability pattern that protects against Availability Zone failures, not application-level correlated failures like a poison pill. A poison pill would cause instances in all Availability Zones to fail, eventually affecting the entire fleet.
Configuring a dead-letter queue (DLQ) on the SQS queue is an essential practice for handling poison pill messages, but it does not solve the architectural problem of blast radius for the compute fleet. The DLQ isolates the problematic message after it has failed processing multiple times, but during those failures, it would have already impacted the shared compute fleet, affecting all tenants. Shuffle sharding proactively contains the impact of the compute failure itself.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is shuffle sharding in AWS?
Open an interactive chat with Bash
How is shuffle sharding different from traditional sharding?
Open an interactive chat with Bash
Why is the bulkhead pattern not suitable for a large multi-tenant SaaS application?
Open an interactive chat with Bash
AWS Certified Solutions Architect Professional SAP-C02
Design for New Solutions
Your Score:
Report Issue
Bash, the Crucial Exams Chat Bot
AI Bot
Loading...
Loading...
Loading...
IT & Cybersecurity Package Join Premium for Full Access