A machine-learning engineer must deploy a Docker-packaged real-time fraud-detection model. Traffic is usually near zero but can spike unpredictably to thousands of requests per second. The business wants to pay nothing while the service is idle yet keep end-to-end inference latency below 100 ms during spikes. Which cloud deployment approach best meets these requirements?
Host the container on a single, large bare-metal instance to eliminate virtualization overhead.
Run the model on a managed Kubernetes cluster with a fixed-size node pool sized for average traffic.
Provision a fleet of reserved virtual machines sized for the maximum anticipated peak load.
Deploy the container on a request-driven serverless container platform that supports automatic scale-to-zero (for example, Google Cloud Run or Azure Container Apps).
A request-driven serverless container platform-such as Google Cloud Run or Azure Container Apps-meets both goals. These services automatically scale container instances from zero to many within seconds and bill per request, so no compute costs accrue while the application is idle. You can also pin a small minimum instance count to keep one warm container ready, ensuring sub-100 ms response times even during sudden bursts. In contrast, reserving VM capacity or using a fixed-size Kubernetes node pool wastes money during idle periods, while a single bare-metal host cannot absorb large spikes.
Ask Bash
Bash is our AI bot, trained to help you pass your exam. AI Generated Content may display inaccurate information, always double-check anything important.
What is a serverless container platform?
Open an interactive chat with Bash
How does auto-scaling work in serverless platforms?
Open an interactive chat with Bash
What does scale-to-zero mean in the context of serverless computing?