GCP Professional Cloud Architect Practice Question

An online ticketing platform runs a stateless container-based API that sees an average load of 50 requests per second (RPS) but experiences unpredictable flash-sale surges of up to 5 000 RPS for only a few minutes at a time. The current 3-node GKE Standard cluster is sized for peak demand, resulting in very low utilization most of the day and high operational overhead. You must redesign the compute layer so that it 1) keeps p99 latency below 300 ms during spikes, 2) eliminates almost all idle capacity costs, and 3) minimizes day-to-day infrastructure management effort. Which approach best meets these requirements?

Replace the GKE cluster with a regional managed instance group of 30 e2-standard-4 VMs behind an external HTTP(S) load balancer and enable autoscaling at 60 % CPU utilization.
Keep the existing GKE cluster and enable Cluster Autoscaler, setting the node pool to scale between 1 and 30 nodes and use a Horizontal Pod Autoscaler at 70 % CPU utilization.
Package the API into a container, deploy it to Cloud Run with minInstances set to 0, enable CPU always-allocated = false, and rely on automatic request-based scaling for traffic spikes.
Convert the current nodes to preemptible VMs to lower per-instance price and disable autoscaling to prevent scale-up delays during traffic spikes.

Report Issue

Answer Description

Migrating the stateless containers to Cloud Run and allowing the service to scale from zero instances delivers the greatest flexibility and cost efficiency. Cloud Run automatically brings new container instances online in sub-seconds when requests arrive, helping maintain low p99 latency during sudden traffic bursts. With minInstances set to 0 and CPU throttling enabled, no compute charges accrue while the service is idle, so idle-capacity cost is effectively removed. Operations overhead is minimal because Google fully manages the underlying infrastructure and autoscaling.

Enabling GKE Cluster Autoscaler can reduce, but not eliminate, idle cost because at least one node must remain and new nodes can take minutes to provision, risking latency SLO violations. A regional managed instance group sized for peak demand keeps all 30 VMs running, so cost and management overhead remain high. Using fixed preemptible nodes without autoscaling cuts price per VM but still pays for continuously running instances and removes elasticity, violating both cost and latency goals. Therefore, Cloud Run with scale-to-zero and high concurrency is the only option that satisfies all three requirements.

Ask Bash

Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.