Bash, the Crucial Exams Chat Bot
AI Bot
Deployment and Operationalization Flashcards
AWS Machine Learning Engineer Associate MLA-C01 Flashcards
| Front | Back |
| Approach for handling imbalanced traffic across endpoints | Use traffic shifting with weighted routing for dynamic load distribution |
| Benefits of using Amazon CloudFront in ML deployments | Speeds up content delivery and reduces latency with a global CDN |
| Benefits of using AWS autoscaling for SageMaker endpoints | Automatically adjusts resources based on traffic demands |
| Best practices for AWS Lambda function design for inference | Optimize memory and runtime for lower costs and higher throughput |
| Challenges of scaling ML models horizontally | Increased costs and possible inconsistencies in system state between replicas |
| Common challenge in deploying ML models and its solution | Model drift; implement regular re-training and monitoring |
| Concept of model monitoring in production | Ensuring performance and detecting drift in model predictions |
| Difference between batch and real-time inference | Batch processes data in chunks, while real-time has low-latency one-by-one predictions |
| Difference between online and offline model retraining | Online retraining updates a model incrementally with new data, while offline retraining requires retraining from scratch |
| Difference between synchronous and asynchronous inference in SageMaker | Synchronous processes requests one at a time, while asynchronous queues requests for later processing |
| Difference between warm starts and cold starts in AWS Lambda | Warm starts reuse the existing container, while cold starts initiate a new one, causing a delay |
| Explain the purpose of model explainability in deployment | Understand and validate model predictions to ensure transparency and fairness |
| How Amazon SageMaker aids in model deployment | Provides a managed service for hosting endpoints and scaling models |
| How AWS Inferentia accelerates ML inference workloads | Uses dedicated chips optimized for high-performance ML tasks at lower costs |
| How AWS Step Functions assist in complex ML workflows | Orchestrates multiple services and tasks into a serverless workflow |
| How Data Wrangler simplifies data pre-processing in SageMaker | Offers a visual interface for cleaning, transforming, and analyzing data |
| How to connect Amazon S3 with SageMaker for model deployment | Use S3 for retrieving and storing model artifacts |
| How to enable monitoring for a SageMaker endpoint | Set up CloudWatch alarms and logs for performance metrics |
| How to implement feature logging during inference | Store input features along with predictions for monitoring and debugging |
| How to version machine learning models in S3 | Use distinct prefixes or labels to track different versions of artifacts |
| Impact of cold starts on real-time serving latency | Initial container start-up delays inference, affecting response time for users |
| Importance of scaling ML models in production | Handling increasing load and maintaining low latency |
| Key advantage of deploying machine learning models on AWS | Scalability and cost efficiency |
| Key logging tools for monitoring ML models on AWS | AWS CloudWatch and Amazon S3 logs |
| Methodology for A/B testing ML models on SageMaker | Deploy multiple endpoints and route traffic proportionally to assess performance |
| Purpose of AWS Lambda in operationalizing ML models | Automating inference tasks with serverless compute |
| Purpose of containerization in ML model deployment | Ensures consistency across environments and simplifies scalability |
| Purpose of SageMaker Multi-Model endpoints | Hosts multiple models on a single endpoint to optimize costs and resource use |
| Purpose of using a custom Docker container in SageMaker | Allows packaging specific dependencies and configurations required by the model |
| Role of Amazon S3 in model deployment | Storing model artifacts and data for inference |
| Role of Elastic Load Balancing in ML model deployment | Distributes incoming traffic across multiple instances to ensure availability and reliability |
| Security measures for deploying ML models on AWS | Configure IAM roles, encrypt data, and network safety measures |
| Steps to create a SageMaker endpoint | Model deployment, endpoint creation, and configuration setup |
| Strategies for reducing latency in real-time model inference | Optimize code and infrastructure, use GPUs for large computations |
| Use case for batch inference in a deployment scenario | Processing large datasets periodically for predictions |
| What is endpoint lifecycle management in SageMaker | Managing creation, scaling, and deletion of endpoints |
| What is the role of autoscaling policies in AWS application load balancers | Adjusts target response latency or request count to manage changing traffic loads |
| What is the role of AWS API Gateway in ML operationalization | Acts as an interface to invoke ML models hosted on AWS |
| Why is canary deployment used in ML operations | Gradually routes traffic to a new model version to test changes and minimize risks |
This deck explores methodologies for deploying, monitoring, and scaling machine learning models with AWS services such as S3, Lambda, and SageMaker endpoints.