Bash, the Crucial Exams Chat Bot
AI Bot

Troubleshooting, Monitoring & Best Practices (AB-900) Flashcards

Microsoft 365 Certified: Copilot and Agent Administration Fundamentals AB-900 Flashcards

Study our Troubleshooting, Monitoring & Best Practices (AB-900) flashcards for the Microsoft 365 Certified: Copilot and Agent Administration Fundamentals AB-900 exam with 40+ flashcards. Review key concepts as flashcards, a searchable table, or an interactive matching game to reinforce exam concepts.
Microsoft 365 Certified: Copilot and Agent Administration Fundamentals AB-900 Course Header Image
FrontBack
Action when an agent becomes unresponsiveRestart the agent capture diagnostics and collect last logs for analysis
Best practice for blue green deploymentsUse traffic shifting to validate new release then roll back quickly on failure
Best practice for configuration managementStore configs in version control and use immutable deployments for reproducibility
Best practice for dependency updatesPin dependency versions run automated tests and deploy to canary before full rollout
Best practice for logging sensitive dataMask or redact PII at source and avoid logging secrets
Common cause of high API costsExcessive token usage or inefficient prompting and lack of caching
Common cause when Copilot returns irrelevant answersInsufficient context or wrong system prompt; provide clearer context; update system prompt and retry
First step when latency spikes occurCheck resource utilization CPU memory and network then correlate with recent deployments
How to audit changes that caused regressionCheck commit history CI pipeline artifacts and rollback to stable release
How to collect logs for an agentEnable debug logging in agent config; gather application logs system logs and transport logs
How to confirm data exfiltration riskCheck outbound connections access logs and unusual data transfer patterns
How to detect security incidentsMonitor for unusual authentication attempts privilege escalations and unexpected outbound traffic
How to diagnose memory leaks in agentsMonitor memory growth over time using heap dumps and profiler captures
How to handle corrupted model cacheClear cache restart service and warm cache with known good requests
How to handle rate limit errorsImplement exponential backoff retries and request batching where possible
How to monitor latency percentilesTrack p50 p90 and p99 and prioritize fixes based on p99 impact
How to monitor model prompt usageLog prompts and correlate with cost and performance while applying privacy controls
How to perform root cause analysis for errorsReproduce issue capture logs and traces then narrow down to code or infra change
How to prevent replay attacksImplement nonces timestamps and short lived tokens
How to profile CPU hot spotsUse sampling profilers to find functions with highest CPU time and optimize or refactor
How to reduce cold start latencyKeep warm instances use lightweight initialization and preload models where possible
How to reproduce intermittent failuresRecord input and environment state then run stress tests with same load profile
How to secure logs in transit and at restUse TLS for transport and encryption with access controls for stored logs
How to test disaster recovery plansRun scheduled failover drills and validate data integrity and recovery time objectives
How to tune prompt length for performanceMinimize context to necessary tokens and cache static context where possible
How to validate agent permissionsAudit IAM roles and least privilege assignments and run permission checks
Indicator of throttling at network layerIncrease in connection resets timeouts or HTTP 429 responses from services
Key indicator of model degradationShift in user satisfaction scores or sudden drop in task completion rate
Primary metric to monitor agent healthHeartbeat or alive signal frequency and success rate
Recommended alerting strategyAvoid alert fatigue by setting severity thresholds and routing to oncall with runbooks
Recommended retention policy for logsKeep high fidelity logs short term for debugging and aggregated summaries longer term
Recommended sleep strategy for retry logicUse exponential backoff with jitter to avoid thundering herd problems
Steps for secure incident responseIsolate affected systems preserve evidence rotate credentials and perform forensic analysis
Tool to centralize logs across instancesUse a log aggregator like Elasticsearch Splunk or a hosted logging service
Typical resolution for authentication failuresVerify credentials and tokens check clock skew and refresh or rotate keys
What to do on discovery of leaked keysRevoke keys rotate secrets and search logs for suspicious usage
What to include in a diagnostic bundleApplication logs config files traces metrics and recent deployment manifests
When to enable tracingEnable distributed tracing for requests that span multiple services to identify bottlenecks
When to increase agent concurrencyWhen CPU and memory headroom exist and response latency remains acceptable
When to scale horizontally vs verticallyScale horizontally for stateless services and vertically for single process bound by CPU
Share on...
Follow us on...