Troubleshooting, Monitoring & Best Practices (AB-900) Flashcards

Microsoft 365 Certified: Copilot and Agent Administration Fundamentals AB-900 Flashcards

Study our Troubleshooting, Monitoring & Best Practices (AB-900) flashcards for the Microsoft 365 Certified: Copilot and Agent Administration Fundamentals AB-900 exam with 40+ flashcards. View as flashcards, a searchable table, or as a fun matching game.

Microsoft 365 Certified: Copilot and Agent Administration Fundamentals AB-900 Course Header Image

Front	Back
Action when an agent becomes unresponsive	Restart the agent capture diagnostics and collect last logs for analysis
Best practice for blue green deployments	Use traffic shifting to validate new release then roll back quickly on failure
Best practice for configuration management	Store configs in version control and use immutable deployments for reproducibility
Best practice for dependency updates	Pin dependency versions run automated tests and deploy to canary before full rollout
Best practice for logging sensitive data	Mask or redact PII at source and avoid logging secrets
Common cause of high API costs	Excessive token usage or inefficient prompting and lack of caching
Common cause when Copilot returns irrelevant answers	Insufficient context or wrong system prompt; provide clearer context; update system prompt and retry
First step when latency spikes occur	Check resource utilization CPU memory and network then correlate with recent deployments
How to audit changes that caused regression	Check commit history CI pipeline artifacts and rollback to stable release
How to collect logs for an agent	Enable debug logging in agent config; gather application logs system logs and transport logs
How to confirm data exfiltration risk	Check outbound connections access logs and unusual data transfer patterns
How to detect security incidents	Monitor for unusual authentication attempts privilege escalations and unexpected outbound traffic
How to diagnose memory leaks in agents	Monitor memory growth over time using heap dumps and profiler captures
How to handle corrupted model cache	Clear cache restart service and warm cache with known good requests
How to handle rate limit errors	Implement exponential backoff retries and request batching where possible
How to monitor latency percentiles	Track p50 p90 and p99 and prioritize fixes based on p99 impact
How to monitor model prompt usage	Log prompts and correlate with cost and performance while applying privacy controls
How to perform root cause analysis for errors	Reproduce issue capture logs and traces then narrow down to code or infra change
How to prevent replay attacks	Implement nonces timestamps and short lived tokens
How to profile CPU hot spots	Use sampling profilers to find functions with highest CPU time and optimize or refactor
How to reduce cold start latency	Keep warm instances use lightweight initialization and preload models where possible
How to reproduce intermittent failures	Record input and environment state then run stress tests with same load profile
How to secure logs in transit and at rest	Use TLS for transport and encryption with access controls for stored logs
How to test disaster recovery plans	Run scheduled failover drills and validate data integrity and recovery time objectives
How to tune prompt length for performance	Minimize context to necessary tokens and cache static context where possible
How to validate agent permissions	Audit IAM roles and least privilege assignments and run permission checks
Indicator of throttling at network layer	Increase in connection resets timeouts or HTTP 429 responses from services
Key indicator of model degradation	Shift in user satisfaction scores or sudden drop in task completion rate
Primary metric to monitor agent health	Heartbeat or alive signal frequency and success rate
Recommended alerting strategy	Avoid alert fatigue by setting severity thresholds and routing to oncall with runbooks
Recommended retention policy for logs	Keep high fidelity logs short term for debugging and aggregated summaries longer term
Recommended sleep strategy for retry logic	Use exponential backoff with jitter to avoid thundering herd problems
Steps for secure incident response	Isolate affected systems preserve evidence rotate credentials and perform forensic analysis
Tool to centralize logs across instances	Use a log aggregator like Elasticsearch Splunk or a hosted logging service
Typical resolution for authentication failures	Verify credentials and tokens check clock skew and refresh or rotate keys
What to do on discovery of leaked keys	Revoke keys rotate secrets and search logs for suspicious usage
What to include in a diagnostic bundle	Application logs config files traces metrics and recent deployment manifests
When to enable tracing	Enable distributed tracing for requests that span multiple services to identify bottlenecks
When to increase agent concurrency	When CPU and memory headroom exist and response latency remains acceptable
When to scale horizontally vs vertically	Scale horizontally for stateless services and vertically for single process bound by CPU

Related Study Materials

Microsoft 365 Certified: Copilot and Agent Administration Fundamentals AB-900 Study Materials Microsoft 365 Certified: Copilot and Agent Administration Fundamentals AB-900 Practice Tests

Related Flashcards

Copilot Core Concepts (AB-900) Microsoft 365 Architecture & Components (AB-900) Security, Compliance & Data Governance (AB-900) Deployment, Administration & Licensing (AB-900)

About the Flashcards

Flashcards for the Microsoft 365 Certified: Copilot and Agent Administration Fundamentals exam focus on the operational lifecycle of intelligent agents. Review how to configure debug logging, centralize log streams, and protect sensitive information while meeting retention policies. Learn the key health metrics, heartbeat signals, and latency percentiles that reveal performance trends, plus effective alerting approaches that avoid fatigue.

The deck also guides you through troubleshooting techniques for high latency, memory leaks, rate limits, and API cost spikes. It covers secure incident response, blue-green deployments, scaling decisions, and prompt optimization to keep models responsive and cost-effective. Use these cards to reinforce terminology and best-practice workflows that often appear on exam scenarios.

Topics covered in this flashcard deck:

Agent monitoring metrics
Logging & tracing
Performance troubleshooting
Security & incident response
Scaling and deployments

Share on...