AWS Certified Solutions Architect Professional SAP-C02 Practice Question
Your company operates three multi-tenant microservice platforms on Amazon EKS in separate AWS Regions. Each cluster already exposes Prometheus metrics through the AWS Distro for OpenTelemetry (ADOT) collector. During the last six months, daily peak request volume has grown by roughly 20 percent month-over-month, and leadership wants to avoid reliability incidents caused by capacity shortages.
The SRE team must design a monitoring solution that lets them study long-term usage trends so they can proactively request service-quota increases and scale the clusters before bottlenecks occur.
Business requirements:
- Retain all application and infrastructure metrics for at least 36 months without managing any servers.
- Provide ad-hoc, interactive dashboards that automatically include new clusters as they are added.
- Minimize operational overhead while following AWS best practices for observability.
Which solution will meet these requirements?
Install the CloudWatch agent on each node to publish custom metrics; stream the metrics to Amazon S3 with CloudWatch Metric Streams and Kinesis Data Firehose; query with Amazon Athena; and visualize trends in Amazon QuickSight.
Enable Amazon CloudWatch Metrics Insights, create anomaly-detection alarms, and use CloudWatch dashboards to trend container metrics for the required 36-month period.
Deploy self-managed Prometheus servers with Thanos sidecars in every cluster, store metrics in an Amazon S3 bucket, and run Grafana on Amazon EC2 instances for visualization.
Use a single Amazon Managed Service for Prometheus workspace set to a 1,095-day retention period. Configure each ADOT collector to remote-write to the workspace and connect an Amazon Managed Grafana workspace to AMP for fleet-wide interactive dashboards.