Bash, the Crucial Exams Chat Bot
AI Bot
Troubleshooting, Monitoring & Best Practices (AB-900) Flashcards
Microsoft 365 Certified: Copilot and Agent Administration Fundamentals AB-900 Flashcards
Study our Troubleshooting, Monitoring & Best Practices (AB-900) flashcards for the Microsoft 365 Certified: Copilot and Agent Administration Fundamentals AB-900 exam with 40+ flashcards. Review key concepts as flashcards, a searchable table, or an interactive matching game to reinforce exam concepts.

| Front | Back |
| Action when an agent becomes unresponsive | Restart the agent capture diagnostics and collect last logs for analysis |
| Best practice for blue green deployments | Use traffic shifting to validate new release then roll back quickly on failure |
| Best practice for configuration management | Store configs in version control and use immutable deployments for reproducibility |
| Best practice for dependency updates | Pin dependency versions run automated tests and deploy to canary before full rollout |
| Best practice for logging sensitive data | Mask or redact PII at source and avoid logging secrets |
| Common cause of high API costs | Excessive token usage or inefficient prompting and lack of caching |
| Common cause when Copilot returns irrelevant answers | Insufficient context or wrong system prompt; provide clearer context; update system prompt and retry |
| First step when latency spikes occur | Check resource utilization CPU memory and network then correlate with recent deployments |
| How to audit changes that caused regression | Check commit history CI pipeline artifacts and rollback to stable release |
| How to collect logs for an agent | Enable debug logging in agent config; gather application logs system logs and transport logs |
| How to confirm data exfiltration risk | Check outbound connections access logs and unusual data transfer patterns |
| How to detect security incidents | Monitor for unusual authentication attempts privilege escalations and unexpected outbound traffic |
| How to diagnose memory leaks in agents | Monitor memory growth over time using heap dumps and profiler captures |
| How to handle corrupted model cache | Clear cache restart service and warm cache with known good requests |
| How to handle rate limit errors | Implement exponential backoff retries and request batching where possible |
| How to monitor latency percentiles | Track p50 p90 and p99 and prioritize fixes based on p99 impact |
| How to monitor model prompt usage | Log prompts and correlate with cost and performance while applying privacy controls |
| How to perform root cause analysis for errors | Reproduce issue capture logs and traces then narrow down to code or infra change |
| How to prevent replay attacks | Implement nonces timestamps and short lived tokens |
| How to profile CPU hot spots | Use sampling profilers to find functions with highest CPU time and optimize or refactor |
| How to reduce cold start latency | Keep warm instances use lightweight initialization and preload models where possible |
| How to reproduce intermittent failures | Record input and environment state then run stress tests with same load profile |
| How to secure logs in transit and at rest | Use TLS for transport and encryption with access controls for stored logs |
| How to test disaster recovery plans | Run scheduled failover drills and validate data integrity and recovery time objectives |
| How to tune prompt length for performance | Minimize context to necessary tokens and cache static context where possible |
| How to validate agent permissions | Audit IAM roles and least privilege assignments and run permission checks |
| Indicator of throttling at network layer | Increase in connection resets timeouts or HTTP 429 responses from services |
| Key indicator of model degradation | Shift in user satisfaction scores or sudden drop in task completion rate |
| Primary metric to monitor agent health | Heartbeat or alive signal frequency and success rate |
| Recommended alerting strategy | Avoid alert fatigue by setting severity thresholds and routing to oncall with runbooks |
| Recommended retention policy for logs | Keep high fidelity logs short term for debugging and aggregated summaries longer term |
| Recommended sleep strategy for retry logic | Use exponential backoff with jitter to avoid thundering herd problems |
| Steps for secure incident response | Isolate affected systems preserve evidence rotate credentials and perform forensic analysis |
| Tool to centralize logs across instances | Use a log aggregator like Elasticsearch Splunk or a hosted logging service |
| Typical resolution for authentication failures | Verify credentials and tokens check clock skew and refresh or rotate keys |
| What to do on discovery of leaked keys | Revoke keys rotate secrets and search logs for suspicious usage |
| What to include in a diagnostic bundle | Application logs config files traces metrics and recent deployment manifests |
| When to enable tracing | Enable distributed tracing for requests that span multiple services to identify bottlenecks |
| When to increase agent concurrency | When CPU and memory headroom exist and response latency remains acceptable |
| When to scale horizontally vs vertically | Scale horizontally for stateless services and vertically for single process bound by CPU |