During a monthly post-mortem review, a systems administrator notes four unplanned outages for a file server. The elapsed time from detection of the fault to full restoration for each incident was 12 minutes, 45 minutes, 18 minutes, and 15 minutes. To update the system documentation, the administrator must record the current mean time to recover (MTTR). Which value should be entered?
MTTR is defined as the average amount of time required to restore a system or service after a failure. The administrator therefore adds all recovery durations for the period (12 + 45 + 18 + 15 = 90 minutes) and divides by the number of incidents (4). The calculated MTTR is 22.5 minutes, which rounds to about 23 minutes. Recording the maximum single outage (45 minutes) or the cumulative downtime (90 minutes) would mis-represent the average recovery performance, and 7.5 days is a misunderstanding of mean time between failures, not mean time to recover.