Deployment and Operationalization Flashcards
AWS Machine Learning Engineer Associate MLA-C01 Flashcards

| Front | Back |
| Approach for handling imbalanced traffic across endpoints | Use traffic shifting with weighted routing for dynamic load distribution |
| Benefits of using Amazon CloudFront in ML deployments | Speeds up content delivery and reduces latency with a global CDN |
| Benefits of using AWS autoscaling for SageMaker endpoints | Automatically adjusts resources based on traffic demands |
| Best practices for AWS Lambda function design for inference | Optimize memory and runtime for lower costs and higher throughput |
| Challenges of scaling ML models horizontally | Increased costs and possible inconsistencies in system state between replicas |
| Common challenge in deploying ML models and its solution | Model drift; implement regular re-training and monitoring |
| Concept of model monitoring in production | Ensuring performance and detecting drift in model predictions |
| Difference between batch and real-time inference | Batch processes data in chunks, while real-time has low-latency one-by-one predictions |
| Difference between online and offline model retraining | Online retraining updates a model incrementally with new data, while offline retraining requires retraining from scratch |
| Difference between synchronous and asynchronous inference in SageMaker | Synchronous processes requests one at a time, while asynchronous queues requests for later processing |
| Difference between warm starts and cold starts in AWS Lambda | Warm starts reuse the existing container, while cold starts initiate a new one, causing a delay |
| Explain the purpose of model explainability in deployment | Understand and validate model predictions to ensure transparency and fairness |
| How Amazon SageMaker aids in model deployment | Provides a managed service for hosting endpoints and scaling models |
| How AWS Inferentia accelerates ML inference workloads | Uses dedicated chips optimized for high-performance ML tasks at lower costs |
| How AWS Step Functions assist in complex ML workflows | Orchestrates multiple services and tasks into a serverless workflow |
| How Data Wrangler simplifies data pre-processing in SageMaker | Offers a visual interface for cleaning, transforming, and analyzing data |
| How to connect Amazon S3 with SageMaker for model deployment | Use S3 for retrieving and storing model artifacts |
| How to enable monitoring for a SageMaker endpoint | Set up CloudWatch alarms and logs for performance metrics |
| How to implement feature logging during inference | Store input features along with predictions for monitoring and debugging |
| How to version machine learning models in S3 | Use distinct prefixes or labels to track different versions of artifacts |
| Impact of cold starts on real-time serving latency | Initial container start-up delays inference, affecting response time for users |
| Importance of scaling ML models in production | Handling increasing load and maintaining low latency |
| Key advantage of deploying machine learning models on AWS | Scalability and cost efficiency |
| Key logging tools for monitoring ML models on AWS | AWS CloudWatch and Amazon S3 logs |
| Methodology for A/B testing ML models on SageMaker | Deploy multiple endpoints and route traffic proportionally to assess performance |
| Purpose of AWS Lambda in operationalizing ML models | Automating inference tasks with serverless compute |
| Purpose of containerization in ML model deployment | Ensures consistency across environments and simplifies scalability |
| Purpose of SageMaker Multi-Model endpoints | Hosts multiple models on a single endpoint to optimize costs and resource use |
| Purpose of using a custom Docker container in SageMaker | Allows packaging specific dependencies and configurations required by the model |
| Role of Amazon S3 in model deployment | Storing model artifacts and data for inference |
| Role of Elastic Load Balancing in ML model deployment | Distributes incoming traffic across multiple instances to ensure availability and reliability |
| Security measures for deploying ML models on AWS | Configure IAM roles, encrypt data, and network safety measures |
| Steps to create a SageMaker endpoint | Model deployment, endpoint creation, and configuration setup |
| Strategies for reducing latency in real-time model inference | Optimize code and infrastructure, use GPUs for large computations |
| Use case for batch inference in a deployment scenario | Processing large datasets periodically for predictions |
| What is endpoint lifecycle management in SageMaker | Managing creation, scaling, and deletion of endpoints |
| What is the role of autoscaling policies in AWS application load balancers | Adjusts target response latency or request count to manage changing traffic loads |
| What is the role of AWS API Gateway in ML operationalization | Acts as an interface to invoke ML models hosted on AWS |
| Why is canary deployment used in ML operations | Gradually routes traffic to a new model version to test changes and minimize risks |
About the Flashcards
Flashcards for the AWS Machine Learning Engineer Associate exam give you a quick-hit review of deploying and operating machine learning workloads on AWS. Each card drills key definitions, service roles, and workflow steps so you can recall them under time pressure.
You'll refresh the core functions of Amazon SageMaker, S3, Lambda, API Gateway, Step Functions, CloudFront, and Inferentia as they relate to scalable, secure, low-latency inference. Concepts such as endpoint lifecycle management, autoscaling policies, canary deployments, model versioning, batch vs real-time inference, monitoring for drift, and optimizing cold starts are all covered, helping you connect architecture choices to exam scenarios.
Topics covered in this flashcard deck:
- SageMaker endpoints & scaling
- Serverless ML with Lambda
- Batch vs real-time inference
- Model monitoring & drift
- Security & IAM controls
- Autoscaling and load balancing