AWS Certified CloudOps Engineer Associate Practice Test (SOA-C03)
Use the form below to configure your AWS Certified CloudOps Engineer Associate Practice Test (SOA-C03). The practice test can be configured to only include certain exam objectives and domains. You can choose between 5-100 questions and set a time limit.

AWS Certified CloudOps Engineer Associate SOA-C03 Information
The AWS Certified CloudOps Engineer – Associate certification validates your ability to deploy, operate, and manage cloud workloads on AWS. It’s designed for professionals who maintain and optimize cloud systems while ensuring they remain reliable, secure, and cost-efficient. This certification focuses on modern cloud operations and engineering practices, emphasizing automation, monitoring, troubleshooting, and compliance across distributed AWS environments. You’ll be expected to understand how to manage and optimize infrastructure using services like CloudWatch, CloudTrail, EC2, Lambda, ECS, EKS, IAM, and VPC.
The exam covers the full lifecycle of cloud operations through five key domains: Monitoring and Performance, Reliability and Business Continuity, Deployment and Automation, Security and Compliance, and Networking and Content Delivery. Candidates are tested on their ability to configure alerting and observability, apply best practices for fault tolerance and high availability, implement infrastructure as code, and enforce security policies across AWS accounts. You’ll also demonstrate proficiency in automating common operational tasks and handling incident response scenarios using AWS tools and services.
Earning this certification shows employers that you have the technical expertise to manage AWS workloads efficiently at scale. It’s ideal for CloudOps Engineers, Cloud Support Engineers, and Systems Administrators who want to prove their ability to keep AWS environments running smoothly in production. By earning this credential, you demonstrate the hands-on skills needed to ensure operational excellence and reliability in today’s fast-moving cloud environments.

Free AWS Certified CloudOps Engineer Associate SOA-C03 Practice Test
- 20 Questions
- Unlimited
- Monitoring, Logging, Analysis, Remediation, and Performance OptimizationReliability and Business ContinuityDeployment, Provisioning, and AutomationSecurity and ComplianceNetworking and Content Delivery
Free Preview
This test is a free preview, no account required.
Subscribe to unlock all content, keep track of your scores, and access AI features!
An EC2 m6i.large instance copies a 2 TB tar file to an S3 bucket with the command aws s3 cp /data/archive.tar s3://corp-logs/
. Network CloudWatch metrics show the instance can sustain 8 Gbps, but the transfer stalls around 500 Mbps and uses only one TCP connection. Without changing the instance type or writing custom code, which AWS CLI adjustment will MOST increase upload throughput?
Add an S3 transfer configuration in
~/.aws/config
such as:[s3] multipart_threshold = 64MB max_concurrent_requests = 50
Set
multipart_chunksize = 5MB
to create many smaller parts during the upload.Turn off enhanced networking on the instance to eliminate driver overhead.
Use
--storage-class GLACIER
in thecp
command so the object uploads into the GLACIER storage class.
Answer Description
The AWS CLI reaches higher S3 PUT throughput by uploading multiple parts concurrently. Adding the settings multipart_threshold = 64MB
(so the 2 TB object is automatically split) and max_concurrent_requests = 50
(so up to 50 threads transfer parts in parallel) enables dozens of parallel TCP streams, allowing the instance to drive far more bandwidth. Changing the storage class provides no performance benefit, using 5 MB parts adds overhead rather than throughput, and disabling enhanced networking would further reduce, not increase, network speed.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is multipart upload in S3?
How does `max_concurrent_requests` improve S3 upload speeds?
Why is `multipart_chunksize` critical in S3 uploads?
What is multipart upload in AWS S3?
Why does setting `max_concurrent_requests` increase S3 upload speed?
What does `multipart_threshold` control in AWS CLI?
An operations team reviews CloudWatch metrics for a 4 TiB io1 volume provisioned with 20,000 IOPS that backs a bursty analytics workload. During peak hours VolumeReadOps stays below 500 IOPS and VolumeQueueLength remains under 1. Management asks to slash storage costs without impacting current performance. What is the MOST cost-effective change?
Replace the io1 volume with a 4 TiB gp3 volume using the default 3,000 IOPS and 125 MiB/s throughput.
Lower the provisioned IOPS on the io1 volume from 20,000 to 2,000.
Convert the volume to an st1 throughput-optimized HDD volume.
Convert the volume to a gp2 general-purpose SSD volume of the same size.
Answer Description
The workload requires only a few hundred IOPS, so the io1 volume is greatly over-provisioned. A gp3 volume delivers a baseline of 3,000 IOPS and 125 MiB/s throughput at any size for a per-GiB price about 20 percent lower than gp2 and far lower than io1, with no per-IOPS charge until the application exceeds the baseline. That performance margin comfortably covers the observed 500-IOPS peak while minimizing cost. Converting to st1 would dramatically limit random I/O to roughly 500 IOPS, risking throttling. Reducing io1 provisioned IOPS still leaves the premium io1 price model in place. Switching to gp2 would work functionally, but gp3 offers the same or better performance at a lower price, making it the most economical option.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the difference between io1 and gp3 volumes in AWS?
Why is gp3 a better choice for bursty analytics workloads compared to st1?
How does gp3 pricing compare to gp2 and io1 volumes?
What is the difference between io1 and gp3 volumes in AWS?
What does the VolumeQueueLength metric indicate in CloudWatch?
When should you choose st1 volumes over gp3 or io1?
A company runs a long-running scientific simulation on a single Amazon EC2 instance. The CloudWatch agent publishes a custom MemoryUtilization metric. If memory usage stays above 90 percent for 5 consecutive minutes, an existing Systems Manager Automation runbook must clear application caches on that same instance automatically, without manual intervention. Which approach meets these requirements with the least operational overhead?
Place the instance in an Auto Scaling group with a step-scaling policy based on MemoryUtilization and use a lifecycle hook to run the cache-clearing runbook when the group scales out.
Create a CloudWatch alarm and an EventBridge rule that invokes an AWS Lambda function. The function reads the InstanceId from the event and calls StartAutomationExecution to run the cache-clearing runbook on that instance.
Create a CloudWatch alarm for MemoryUtilization > 90 percent for 5 datapoints. Add an EventBridge rule that filters for the alarm's ALARM state and sets the existing Systems Manager Automation runbook as the target. Use an input transformer to pass the InstanceId from the event to the runbook.
Define a Systems Manager Maintenance Window that executes the cache-clearing runbook every 10 minutes, with a pre-task script that exits if memory usage is below 90 percent.
Answer Description
The most lightweight solution is to use Amazon EventBridge as the glue between the CloudWatch alarm and Systems Manager. A CloudWatch alarm for MemoryUtilization > 90 percent (5 data points) automatically emits a state-change event on the default event bus. A rule that matches the alarm's change to the ALARM state can target Systems Manager Automation directly. By adding an input transformer, the rule passes the InstanceId dimension from the event payload to the runbook's InstanceId parameter, ensuring that the cache-clearing automation runs only on the affected instance. No Lambda functions, Auto Scaling groups, or Maintenance Windows are required.
A Lambda-based approach functions correctly but adds an extra service to build, secure, and maintain. Auto Scaling and Maintenance Windows either perform the wrong remediation or run on a schedule regardless of the metric state. CloudWatch alarm EC2 actions cannot invoke Automation documents, so they do not satisfy the requirement.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What does an input transformer do in EventBridge?
How do CloudWatch alarms trigger state changes?
Why is EventBridge preferred over AWS Lambda in this solution?
What is Amazon EventBridge used for?
How does an input transformer work in EventBridge?
Why is the CloudWatch alarm state change suitable for automation?
EC2 instances in a private subnet are unable to connect to a public API over HTTPS. The private subnet's route table directs 0.0.0.0/0 traffic to a NAT gateway. The instances' security group allows outbound TCP port 443. VPC flow logs on the instances' network interfaces show 'REJECT' entries for inbound traffic on destination ports 1024-65535. Which action will restore connectivity without making the instances publicly accessible?
Attach an internet gateway to the private subnet and add a 0.0.0.0/0 route to it.
Update the private subnet's network ACL to allow inbound TCP traffic on ports 1024-65535 from 0.0.0.0/0.
Add an inbound rule for TCP port 443 to the EC2 instances' security group.
Disable source/destination checking on the NAT gateway's elastic network interface.
Answer Description
Because network ACLs are stateless, return traffic must be explicitly allowed. The NAT gateway translates the source instance's IP address, but the return traffic from the API (source port 443) must be able to reach the original instance on the ephemeral port it used to initiate the connection (destination ports 1024-65535). If the private subnet's NACL does not allow inbound traffic on this high port range, the connection fails. This rejection is captured by VPC flow logs. Allowing this inbound ephemeral port range on the private subnet's NACL resolves the issue. Attaching an internet gateway would make the instances public, which violates the requirement. Disabling source/destination check is only relevant for NAT instances, not managed NAT gateways. Security groups are stateful, so they automatically allow return traffic for connections initiated by the instance; no inbound rule is needed.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why are network ACLs stateless?
What is the role of ephemeral ports in network communication?
How does the NAT gateway facilitate private instance traffic?
What are ephemeral ports and why are they used?
How do stateless network ACLs affect traffic within a VPC?
What is the difference between security groups and network ACLs?
After applying a custom network ACL to a private subnet that hosts EC2 instances that call external SaaS APIs through a NAT gateway, outbound HTTPS traffic fails. The ACL allows outbound TCP 443 to 0.0.0.0/0 and denies all other outbound traffic. Inbound rules allow TCP 22 from 10.0.0.0/16 and TCP 443 from 0.0.0.0/0, then deny all. Which modification will restore connectivity with least privilege?
Change the existing outbound rule to allow all protocols to 0.0.0.0/0.
Add an outbound allow rule for TCP ports 1024-65535 to 0.0.0.0/0.
Add an inbound allow rule for TCP ports 1024-65535 from 0.0.0.0/0.
Replace the outbound rule with UDP port 443 to 0.0.0.0/0.
Answer Description
Network ACLs are stateless, so return traffic must be explicitly permitted. When an instance initiates an HTTPS session, the reply from the remote host arrives from source port 443 and is addressed to an ephemeral port (1024-65535) on the instance. Because the inbound rule set does not currently allow that destination port range, the response is dropped, breaking the connection. Adding an inbound rule that allows TCP 1024-65535 from any source permits only the necessary return traffic while leaving other traffic blocked, satisfying the principle of least privilege. Allowing the same range on the outbound side does nothing for inbound return packets, and opening all ports or all protocols is less restrictive than required.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why are Network ACLs stateless?
What is an ephemeral port and why is it needed in this scenario?
What does 'least privilege' mean in the context of network security?
Why are network ACLs stateless?
What is the role of ephemeral ports in HTTPS traffic?
What does 'least privilege' mean in the context of network security?
Your company maintains a central monitoring account (us-east-1) with CloudWatch dashboards. You must add widgets that show the CPUUtilization metric of EC2 instances in two production accounts (prod-01, prod-02) in us-west-2. Developers in the monitoring account must be able to view dashboards via console or CLI but must not create, modify, or delete them. No extra infrastructure may be deployed. Which approach meets these needs with minimal operational effort?
Generate a CloudWatch dashboard snapshot for each production account and embed the PNG URLs in a new dashboard in the monitoring account. Restrict developers to Amazon S3 read-only access so they cannot update dashboards.
Export the
CPUUtilization
metrics to Amazon S3 with an EventBridge rule, load the data into Amazon QuickSight, and build a cross-account analysis dashboard. Assign developers to a QuickSight reader group.Create a CloudWatch dashboard in each production account and share them with the monitoring account by using AWS Resource Access Manager. Give developers the
ReadOnlyAccess
AWS-managed policy.Enable CloudWatch cross-account observability to link the two production accounts as source accounts, then create the widgets using the account and Region qualifier (for example,
accountId=prod-01
). Attach an IAM policy to the developers' role that permitsGetDashboard
,ListDashboards
,GetMetricData
, andListMetrics
but notPutDashboard
.
Answer Description
CloudWatch cross-account observability lets you visualize metrics that reside in other AWS accounts without moving data. You designate the monitoring account as a monitoring account and link the prod-01 and prod-02 accounts as source accounts. CloudWatch automatically grants the monitoring account permission to query GetMetricData
and ListMetrics
in each source account, so the dashboard widgets can reference metrics by specifying the account ID and Region.
To satisfy the security constraint, attach an IAM policy to the developers' role in the monitoring account that allows only cloudwatch:GetDashboard
, cloudwatch:ListDashboards
, cloudwatch:GetMetricData
, and cloudwatch:ListMetrics
. The policy omits cloudwatch:PutDashboard
, which prevents users from creating, editing, or deleting dashboards.
Options that rely on CloudWatch dashboard snapshots or AWS Resource Access Manager do not allow live cross-account metrics. Exporting metrics to S3 and building a QuickSight dashboard adds unnecessary infrastructure and does not use CloudWatch dashboards, while embedding metrics in Amazon Managed Grafana exceeds the stated scope and cost.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
How does CloudWatch cross-account observability work?
What permissions are needed to restrict actions on CloudWatch dashboards?
What is the difference between CloudWatch dashboards and snapshots?
What is CloudWatch cross-account observability?
How do IAM policies restrict actions in CloudWatch?
Why are other options like snapshots or QuickSight less effective?
An operations engineer installed the CloudWatch agent on several Amazon Linux 2 EC2 instances by using the Systems Manager document AWS-ConfigureAWSPackage. A custom JSON file (shown below) was deployed to each instance and the agent was restarted.
{
"agent": {"metrics_collection_interval": 60},
"metrics": {
"append_dimensions": {"InstanceId": "${aws:InstanceId}"},
"aggregation_dimensions": [["InstanceId"]]
},
"logs": {
"logs_collected": {
"files": {
"collect_list": [
{
"file_path": "/opt/app/server.log",
"log_group_name": "app-logs",
"log_stream_name": "{instance_id}"
}
]
}
}
}
}
Application logs are now visible in CloudWatch Logs, but no memory or disk space metrics appear in CloudWatch Metrics. What is the simplest way to collect these missing metrics on every instance?
Insert mem and disk sections under metrics_collected in the agent JSON file, then restart the CloudWatch agent on each instance.
Turn on detailed monitoring for the instances in the EC2 console.
Attach the managed policy CloudWatchAgentAdminPolicy to the instance profile role.
Edit the AWS-ConfigureAWSPackage document to run the agent in collectd compatibility mode.
Answer Description
The CloudWatch agent publishes only the metrics that are explicitly listed in the metrics_collected section of its JSON configuration. The current file defines no collectors, so the agent sends no memory or disk data even though it is running. Adding the appropriate collectors (for example, a mem block for memory and a disk block for file-system usage) and then restarting or reloading the agent causes the agent to gather and publish those metrics. Enabling EC2 detailed monitoring affects only the built-in instance metrics (CPU, network, etc.) and cannot add memory or disk metrics. Changing the instance role's permissions or modifying the Systems Manager document does not cause the agent to start collecting additional metrics when they are not specified in the configuration.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the role of the CloudWatch agent in collecting metrics?
How does the JSON configuration file in CloudWatch agent affect monitoring?
Why doesn’t enabling detailed monitoring in EC2 add memory or disk metrics?
What is the purpose of the metrics_collected section in the CloudWatch agent JSON file?
How do you add memory and disk metrics to the CloudWatch agent configuration file?
What is the difference between CloudWatch agent metrics and detailed monitoring in the EC2 console?
Your company manages infrastructure for multiple AWS accounts using Terraform. You must build a CI/CD pipeline that: validates plans on every commit, stores Terraform state centrally with locking to prevent simultaneous writes, and avoids long-lived credentials in the pipeline environment. Which approach meets these requirements while following AWS and Terraform best practices?
Configure an encrypted, versioned S3 bucket with a DynamoDB table for state locking; have CodeBuild assume an environment-specific IAM role via STS and run Terraform with the S3 backend.
Store the state file in a CodeCommit repository and enable repository versioning; store each account's access keys in Secrets Manager and inject them into the build environment.
Use the local backend on the CodeBuild container and rely on CodePipeline artifact versioning; create a single IAM user with AdministratorAccess and embed its access keys in the buildspec file.
Wrap Terraform modules in CloudFormation StackSets and use CloudFormation as the remote backend; pass cross-account role ARNs to CodePipeline through environment variables.
Answer Description
Storing the Terraform state in an S3 bucket that has server-side encryption and versioning, while using a DynamoDB table for state locking, satisfies the requirement for a central, collision-free state store. In the pipeline, CodeBuild can assume an account-specific IAM role through AWS STS, so no permanent access keys are exposed. Terraform is initialized with the S3 backend and automatically uses the temporary credentials provided by the assumed role. The other options either lack state locking, rely on insecure long-lived credentials, or misuse services (for example, CodeCommit and CloudFormation are not supported remote backends for Terraform state).
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
Why is an S3 bucket with DynamoDB used for managing Terraform state?
How does AWS STS help avoid long-lived credentials in pipelines?
Why are the other options for Terraform state management incorrect?
What is state locking in Terraform and why is it important?
How does CodeBuild assume an IAM role using AWS STS?
What is the Terraform S3 backend and how does it work?
A company runs a fleet of 20 Amazon EC2 m5.4xlarge instances in an Auto Scaling group across two Availability Zones. CloudWatch shows that for the last 14 days, average CPU utilization has been 18 percent and network throughput is consistently low. Memory usage is below 35 percent on all instances. Management asks the CloudOps engineer to reduce EC2 costs while keeping the same two-AZ architecture and leaving application code unchanged. Which action is the MOST cost-effective and requires the LEAST operational effort?
Enable AWS Compute Optimizer for the account and apply its rightsizing recommendation to move the Auto Scaling group to smaller burstable performance instances that still meet the observed workload.
Configure the Auto Scaling group to launch Spot Instances of the same size in one Availability Zone and On-Demand instances in the other.
Purchase one-year Standard Reserved Instances for the existing m5.4xlarge instance type to obtain a discounted hourly rate.
Create a target tracking scaling policy to double the desired capacity when CPU exceeds 50 percent and halve it when CPU drops below 20 percent.
Answer Description
AWS Compute Optimizer analyzes CloudWatch metrics and compares them against instance specifications, then recommends smaller or different instance types that will satisfy observed resource needs at lower cost. Applying the rightsizing recommendation reduces over-provisioned capacity without touching application code or changing the multi-AZ design. Reserved Instances lower the hourly rate but lock in the same oversized m5.4xlarge footprint, so waste remains. Mixing Spot and On-Demand capacity would cut cost but introduces interruption risk and adds operational complexity. Adding a scaling policy still uses the oversized instance type; because utilization rarely exceeds 18 percent, the group will seldom scale in or save meaningful cost. Therefore, enabling Compute Optimizer and acting on its recommendation is the most economical, low-effort solution.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is AWS Compute Optimizer?
What are burstable performance instances in AWS?
How does using Reserved Instances differ from applying rightsizing recommendations?
What is AWS Compute Optimizer?
What are burstable performance instances in EC2?
Why is a Reserved Instance not the best option in this case?
Your company runs Linux and Windows EC2 instances spread across three AWS accounts. Operations must collect the instances' memory utilization and a set of custom application log files in Amazon CloudWatch without manually copying configuration files to every server. The team also wants to be able to update the agent configuration from a central location. Which approach satisfies these requirements with the least operational overhead?
Install the CloudWatch Logs agent on Linux servers and the unified CloudWatch agent on Windows servers; configure memory metrics later with CloudWatch Metrics Insights queries.
Use Systems Manager Run Command with the AmazonCloudWatch-ManageAgent document to install the unified CloudWatch agent on every instance and have each agent load its JSON configuration from an SSM Parameter Store key that the operations team manages.
Manually copy the CloudWatch agent configuration file into /opt/aws/amazon-cloudwatch-agent on each instance during user data, then start the agent with the local file path.
Enable AWS Config across all accounts to stream operating-system metrics, including memory, into CloudWatch and configure delivery of log files through the same service.
Answer Description
The CloudWatch agent must be used because the standard CloudWatch metrics do not include memory utilization. The AmazonCloudWatch-ManageAgent SSM document can remotely install or update the unified CloudWatch agent on both Linux and Windows instances. When the agent starts, it can pull its JSON configuration from an SSM parameter, allowing administrators to store and edit a single configuration in Systems Manager Parameter Store and apply it across multiple accounts by using Run Command or State Manager. This removes the need to manually place files on every instance.
Using the legacy CloudWatch Logs agent cannot emit memory metrics, and mixing two different agents increases management effort. Storing the configuration locally on each instance still requires manual distribution whenever changes are needed. AWS Config does not collect operating-system metrics, so enabling it would not meet the monitoring requirement.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the AmazonCloudWatch-ManageAgent SSM document?
What is Systems Manager Parameter Store, and how does it simplify agent configuration?
Why can't the AWS Config service be used to monitor memory utilization?
An Auto Scaling group runs Linux workloads on c5.9xlarge instances that are evenly distributed across three Availability Zones. Operations reports show that short-lived analytics jobs occasionally saturate the instance's 10 Gbps network bandwidth, causing retries and delays. The team needs a quick, low-risk change that provides at least 20 Gbps per instance without redesigning the architecture. Which action meets these requirements?
Move the Auto Scaling group into a cluster placement group that spans the three Availability Zones.
Enable jumbo frames (MTU 9001) on every instance network interface.
Attach an Elastic Fabric Adapter (EFA) to each instance.
Update the launch template to use c5n.9xlarge instances.
Answer Description
c5n.9xlarge instances are drop-in replacements for c5.9xlarge but include the "n" network-optimized variant, increasing available bandwidth from 10 Gbps to up to 50 Gbps on Nitro without any additional configuration. Switching the instance type in the launch template is a minimal, low-risk change that immediately satisfies the >20 Gbps requirement.
Cluster placement groups cannot span multiple Availability Zones, so re-architecting the Auto Scaling group to use one would require sacrificing AZ resilience and still would not raise the hard network cap of the c5 instances. Enabling jumbo frames improves efficiency but cannot raise the instance's maximum throughput limit. Elastic Fabric Adapter is designed for HPC and requires a cluster placement group in a single AZ; it does not increase bandwidth across AZs for general TCP traffic and adds operational complexity.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is the difference between c5 and c5n instances in AWS?
What is a cluster placement group, and why doesn't it work across multiple AZs?
What are jumbo frames (MTU 9001), and why don't they help in this scenario?
What is the difference between c5 and c5n instance types?
Why can’t a cluster placement group span multiple Availability Zones?
What is the role of Elastic Fabric Adapter (EFA), and why is it not suitable here?
A company runs a production Amazon RDS for PostgreSQL db.r5.large instance with 2 vCPUs. After enabling Performance Insights, the operations team notices that query latency rises when the database load exceeds the number of vCPUs. They need an automated Systems Manager runbook to execute whenever this situation persists for 5 minutes, while keeping operational overhead low. Which solution meets the requirement?
Create a CloudWatch alarm in the AWS/RDS namespace for the DBLoad metric (statistic: Average, period: 60 seconds, evaluation periods = 5, threshold = 2) and set the alarm action to run the Systems Manager Automation document.
Configure a CloudWatch alarm on the instance's CPUUtilization metric with an 80% threshold for 5 minutes and target the Systems Manager runbook.
Enable Enhanced Monitoring at 1-second granularity and deploy a Lambda function that polls CPU metrics every minute; if CPUUtilization > 80% for 5 checks, invoke the runbook.
Create an RDS event subscription for source type 'db-instance' and event category 'failure'; subscribe an SNS topic that triggers the Systems Manager runbook.
Answer Description
Performance Insights automatically publishes the DBLoad (average active sessions) metric to the AWS/RDS namespace in CloudWatch. A common best practice is to compare DBLoad to the vCPU count; sustained values above the vCPU count indicate CPU saturation. Creating a CloudWatch alarm on the DBLoad metric with a period of 60 seconds, five evaluation periods, and a threshold of 2 (the vCPU count) directly monitors the required condition. CloudWatch alarms can invoke Systems Manager runbooks through alarm actions, so no custom polling or additional services are needed.
The CPUUtilization metric does not measure active sessions and can miss database-specific contention like I/O waits. Building a custom Lambda poller with Enhanced Monitoring adds unnecessary complexity and operational overhead. RDS event subscriptions do not emit events for high DB load; they are for state changes like failures or reboots, so they cannot trigger the runbook for this scenario.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is Performance Insights in Amazon RDS?
What is the DBLoad metric used in CloudWatch?
How do CloudWatch alarms and Systems Manager Automation documents work together?
What is the DBLoad metric in Amazon RDS Performance Insights?
How can Systems Manager Automation documents (runbooks) integrate with CloudWatch alarms?
Why is the DBLoad metric preferred over CPUUtilization for monitoring database performance?
Your team operates a production Amazon EKS cluster that uses managed node groups. You must begin streaming application container logs and granular CPU, memory, disk, and network metrics for every pod to Amazon CloudWatch with minimal ongoing maintenance. You prefer an AWS-managed solution rather than hand-built agents. Which approach meets these requirements?
Enable Container Insights for the cluster by using the CloudWatch console or eksctl, which installs the AWS CloudWatch agent and Fluent Bit DaemonSets that forward metrics and logs to CloudWatch.
Deploy the Prometheus Operator and Grafana inside the cluster, then configure a community exporter to push scraped metrics to CloudWatch.
Add a user-data script to each node group that installs and starts the CloudWatch agent as a systemd service to collect host metrics and the /var/log/containers directory.
Turn on control-plane logging for the cluster so that the API server automatically emits all pod metrics and container log streams to CloudWatch Logs.
Answer Description
Enabling Container Insights for the EKS cluster is an AWS-managed feature that deploys the CloudWatch agent and Fluent Bit as DaemonSets. The agent sends detailed node, pod, and container metrics to CloudWatch, and Fluent Bit streams application logs to CloudWatch Logs. After the one-time enablement, updates to the agents are handled by AWS, so ongoing maintenance is minimal. The other options do not satisfy the stated constraints:
- Control-plane logging only delivers API server logs; it does not provide container metrics or workload log files.
- Prometheus and Grafana are self-managed and require ongoing patching and scaling, and a custom exporter would still need to be maintained.
- Installing the CloudWatch agent through user-data captures host metrics but not container-level metrics and does not address log collection; it also leaves lifecycle management to the operations team.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What are Amazon EKS managed node groups?
What components are included in Container Insights for Amazon EKS?
How does Fluent Bit work in the context of container log streaming?
What is the key role of Fluent Bit in Container Insights?
What advantages does enabling Container Insights have compared to custom agents?
Why are control-plane logs insufficient for container monitoring?
A CloudOps engineer manages an Auto Scaling group of t3.small instances running a latency-sensitive REST API. The p95 request latency occasionally increases to several seconds even though the CloudWatch CPUUtilization metric never rises above 20%. During the same periods, the CPUCreditBalance metric falls to 0 for every instance. What is the most cost-effective change that resolves the performance issue?
Convert the Auto Scaling group to run Spot Instances of the same t3.small instance type.
Add a scaling policy that doubles the desired capacity when CPUUtilization exceeds 60%.
Replace the t3.small instances with m6i.large instances in the launch template.
Modify the launch template so the Auto Scaling group uses T3 Unlimited mode.
Answer Description
The low CPUUtilization combined with a CPUCreditBalance of 0 indicates that the burstable T3 instances are being throttled after depleting their earned CPU credits. Enabling T3 Unlimited lets the instances draw surplus credits, eliminating throttling while keeping the same instance family and size. Surplus credits incur a small additional fee only when used, which is typically cheaper than permanently moving to larger general-purpose instances. Changing scaling policies or using Spot capacity does not address the credit exhaustion, and larger fixed-performance instances remove the throttling but at a higher constant cost.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What are T3 instances, and why are they burstable?
What is T3 Unlimited mode, and how does it resolve credit exhaustion?
What is the CPUCreditBalance metric in CloudWatch, and why does it matter?
What are burstable instances in AWS?
What is T3 Unlimited mode?
What does CPUCreditBalance represent in CloudWatch metrics?
A company uses a single AWS CloudFormation template to deploy a three-tier application that includes Auto Scaling groups and a production Amazon RDS instance. During routine maintenance, an operations engineer must update the stack to patch the application servers. Company policy states that the update must never replace or delete the existing RDS instance. If the template change would cause a replacement, the operation must immediately fail before any resources are modified so the engineer can investigate. Which approach meets these requirements with the least operational effort?
Manually create an RDS snapshot and proceed with the stack update; restore from the snapshot if the database is replaced.
Add the DeletionPolicy and UpdateReplacePolicy attributes with a value of Retain to the RDS resource before updating the stack.
Attach a stack policy that denies all Update:* actions on the RDS resource and then update the stack.
Generate a change set, review it for replacement actions on the RDS resource, and execute the change set only if none are found.
Answer Description
Applying a stack policy that denies all Update actions on the RDS resource blocks CloudFormation from creating a replacement DB instance. When an update would trigger a Replace action on that resource, CloudFormation detects the policy violation, cancels the operation, and rolls the stack back before it touches other resources. This satisfies the requirement automatically and does not require manual review.
Previewing a change set still relies on a human to cancel execution, so a replacement could proceed by mistake. Setting DeletionPolicy or UpdateReplacePolicy to Retain only preserves the old DB instance after a replacement; it does not stop the replacement from occurring. Creating manual snapshots adds operational overhead and likewise does not prevent replacement.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a stack policy in AWS CloudFormation?
How does AWS CloudFormation handle a policy violation during a stack update?
What is the difference between DeletionPolicy and UpdateReplacePolicy in AWS CloudFormation?
A company streams AWS CloudTrail management events from its production account to an existing CloudWatch Logs log group named ProdTrail. Security engineers need a solution that triggers an alert within 1 minute whenever a DeleteBucket API call is written to the log group. The alert must appear as a CloudWatch alarm and send an email through Amazon SNS. Which set of actions meets these requirements with the least operational overhead?
Create an EventBridge rule that matches DeleteBucket events from aws.s3 and sends them to an SNS topic; rely on EventBridge metrics for monitoring.
Create a metric filter on the ProdTrail log group with pattern { $.eventName = "DeleteBucket" }, publish it to a custom CloudWatch metric, and add a 1-minute CloudWatch alarm that notifies an SNS topic.
Configure an S3 event notification on the log bucket that invokes a Lambda function; have the function scan each log file for DeleteBucket events and publish a message to SNS.
Enable CloudTrail Insights on the trail and configure the trail to deliver Insight events to an SNS topic subscribed by the security team.
Answer Description
The simplest native approach is to use CloudWatch Logs metric filters. A metric filter applied to the ProdTrail log group with the pattern { $.eventName = "DeleteBucket" } turns every matching event into a custom CloudWatch metric. A CloudWatch alarm that monitors this metric with a 1-minute period and one evaluation can change its state almost immediately after a matching log entry appears. Configuring the alarm action to publish to an SNS topic delivers the required email notification. The other choices either do not generate a CloudWatch alarm (EventBridge rule), rely on Insights events that only detect anomalies (CloudTrail Insights), or introduce unnecessary custom code and maintenance (Lambda parser for S3 object events).
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a CloudWatch Logs metric filter?
How does SNS work with CloudWatch alarms?
What is the difference between CloudTrail Insights and CloudTrail management events?
What is a CloudWatch Logs metric filter?
How does a CloudWatch alarm work with an SNS topic?
What is the benefit of using CloudWatch Logs metric filters over other approaches?
A company uses two private subnets, one in each of two Availability Zones (AZ-A and AZ-B). All outbound internet traffic is routed through a single NAT gateway that is deployed in a public subnet in AZ-A. After an unplanned AZ-A outage, instances in AZ-B lost internet connectivity. The operations team must improve fault tolerance and reduce inter-AZ data processing charges while keeping administration effort low. What should the team do?
Create a second NAT gateway in a public subnet in AZ-B and update the AZ-B private subnet's route table to use that gateway.
Move the existing NAT gateway to a shared services VPC in AZ-A and route both private subnets to it through VPC peering connections.
Attach an internet gateway directly to each private subnet and add a 0.0.0.0/0 route pointing to it.
Replace the NAT gateway with auto-scaled NAT instances placed in each AZ and manage failover with a Network Load Balancer.
Answer Description
A NAT gateway is an AWS-managed, highly available service within a single AZ. If that AZ fails, the gateway becomes unavailable. Best practice is to create a separate NAT gateway in every AZ and configure each private subnet's route table to use the local gateway. This restores internet access during an AZ outage and prevents traffic from crossing AZ boundaries, eliminating inter-AZ data transfer fees. NAT instances add operational overhead and do not provide built-in redundancy; a single gateway cannot survive an AZ failure; and attaching an internet gateway directly to a private subnet is not possible because IGWs must be associated with the VPC as a whole and require public IP addresses on the instances.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a NAT gateway, and how does it work?
Why does creating NAT gateways in each Availability Zone improve fault tolerance?
What are the inter-AZ data processing charges, and how does this setup reduce them?
Why is it a best practice to create separate NAT gateways in each Availability Zone?
What is a NAT Gateway, and how does it differ from a NAT instance?
What are inter-AZ data transfer charges, and how do NAT Gateways help reduce them?
A CloudOps engineer is configuring Amazon CloudWatch to scale an Auto Scaling group. The group must launch one additional EC2 instance whenever the average CPUUtilization metric stays above 70 percent for 5 consecutive minutes. The solution must work without relying on EventBridge rules, Lambda functions, or other custom code. Which type of action should the engineer attach to the CloudWatch alarm to meet these requirements?
Attach the ARN of a scaling policy associated with the Auto Scaling group.
Publish the alarm state to an Amazon SNS topic that triggers a Lambda function to add capacity.
Use an EC2 recovery action ARN so the instance restarts when the threshold is breached.
Specify a Systems Manager Automation document that uses the EC2:Run command to start a new instance.
Answer Description
CloudWatch alarms can directly invoke several types of predefined actions. When you need an Auto Scaling group to add or remove capacity without any intermediary service, you attach the ARN of an Auto Scaling scaling policy to the alarm. When the alarm transitions into the ALARM state, CloudWatch signals that scaling policy, which immediately updates the group's desired capacity and launches an additional instance. Publishing to an SNS topic and then invoking Lambda introduces extra components and does not satisfy the "no custom code" constraint. Systems Manager Automation documents and EC2 recovery actions are valid alarm actions, but neither changes Auto Scaling group capacity.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is an Auto Scaling group in AWS?
What is a CloudWatch alarm and how does it work?
What is the purpose of a scaling policy in an Auto Scaling group?
What is an Auto Scaling group in AWS?
What is a scaling policy in Amazon EC2 Auto Scaling?
How does a CloudWatch alarm interact with an Auto Scaling group?
A CloudOps engineer must automate backups of an Amazon S3 bucket that stores compliance records. The solution must run a backup at midnight every day, retain each recovery point for 7 years, and allow one-click point-in-time restore. Backups must be immutable and managed centrally with the least operational overhead. Which approach meets these requirements by following AWS best practices?
Configure same-Region cross-account replication to a backup account and apply a lifecycle expiration rule on the destination bucket after 7 years.
Use AWS Backup to assign the bucket to a daily backup plan that stores recovery points in a vault with a 7-year retention period and Vault Lock enabled.
Create a CloudWatch Events rule that invokes an AWS Lambda function daily to copy objects to another bucket configured with S3 Object Lock in compliance mode for 7 years.
Enable S3 versioning and add a lifecycle rule that moves noncurrent object versions to Amazon S3 Glacier Deep Archive after 24 hours.
Answer Description
AWS Backup provides fully managed, incremental backups of Amazon S3 buckets. By assigning the bucket to a daily backup plan, recovery points are created automatically and stored in a backup vault. The vault can be protected with Vault Lock so that retention rules (for example, 7 years) cannot be altered, delivering immutability. The service also offers console-based, point-in-time restores and central policy management. Versioning with lifecycle rules, cross-region replication, or custom Lambda copy jobs can offer extra copies but do not provide centrally enforced retention, immutable recovery points, or the simplified restore workflow required.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is AWS Backup?
What is Vault Lock in AWS Backup?
How does AWS Backup support point-in-time restores?
What is AWS Backup, and how does it work with Amazon S3?
What is Vault Lock, and why is it important?
How does AWS Backup enable point-in-time restores?
A company runs a multi-tenant application on three Auto Scaling groups behind an Application Load Balancer (ALB). The DNS name for the ALB is the primary record in a Route 53 failover policy that redirects traffic to a standby stack in another Region. The operations team must ensure that Route 53 fails over only when all three microservices in the primary Region are unavailable. Which approach provides the required behavior with the least operational overhead?
Create three HTTP health checks, one for each microservice, and create a calculated health check that is healthy only when all child health checks are healthy. Associate the calculated check with the primary record.
Create three HTTP health checks, one for each microservice's /status endpoint, and then create a calculated health check that is healthy when at least one child health check is healthy. Attach the calculated health check to the primary failover record.
Define a CloudWatch alarm that monitors the ALB's UnHealthyHostCount metric and link that alarm to a Route 53 metric-based health check attached to the primary record.
Configure one HTTP health check that monitors the ALB DNS name and set the failure threshold to three consecutive failed checks.
Answer Description
A Route 53 calculated health check can aggregate the status of other Route 53 health checks. By creating one HTTP health check for each microservice's /status endpoint and then creating a calculated health check that is considered healthy when at least one of the three child checks is healthy, the parent check becomes unhealthy only when every microservice is down. Associating this calculated health check with the primary failover record causes Route 53 to direct traffic to the standby stack only after all three microservices have failed.
A single health check against the ALB (choice A) becomes unhealthy if the load balancer itself or any target returns errors, so it can trigger premature failover. Monitoring an ALB CloudWatch metric through a metric-based health check (choice D) requires additional alarms and offers no advantage over using native Route 53 health checks. Configuring a calculated health check that is healthy only when all child checks are healthy (choice C) would mark the primary as unhealthy as soon as one microservice fails, resulting in unnecessary failover.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a calculated health check in Route 53?
How does Route 53 failover policy interact with health checks?
Why is using a single health check on an ALB DNS name not ideal for failover in this scenario?
What is a Route 53 calculated health check?
Why is ‘healthy when at least one child check is healthy’ better in this scenario?
What are the differences between ALB health checks and Route 53 health checks?
Neat!
Looks like that's it! You can go back and review your answers or click the button below to grade your test.