A 2U virtualization host has powered itself off twice this week during nightly batch processing. Each time, the hardware‐management interface records the following events in order:
14:27:15 CPU0 Temperature Critical (95 °C) 14:27:15 Fan 4 Failure Detected (0 RPM) 14:27:17 System entering emergency thermal shutdown
The server room is holding 22 °C, rack air filters are clean, and no firmware updates were performed recently. After creating an image-level backup of the virtual machines, which action should the administrator take NEXT to verify that a malfunctioning fan is the root cause of the shutdowns?
Flash the BIOS and management-controller firmware to the latest versions, then clear all hardware logs.
Remove the heat sinks, apply new thermal paste to both CPUs, and reseat the heat sinks firmly.
Relocate the server to a rack with colder intake air and monitor temperatures during the next batch run.
Replace Fan 4 with a known-good spare and run the server's hardware diagnostics to confirm normal RPM readings.
The log shows Fan 4 reporting 0 RPM immediately before the CPUs reach a critical 95 °C and the platform initiates an emergency thermal shutdown. A non-spinning fan starves the heat sinks of airflow; other fans cannot compensate quickly enough, so the server protects itself by powering off. Swapping the suspect fan with a known-good unit and rerunning the server's built-in diagnostics (or monitoring tachometer readings) directly tests the theory that the fan itself has failed. If the new fan spins at the expected RPM and temperatures remain within safe limits, the fault is confirmed and the incident is resolved.
Updating firmware (flashing BIOS or iDRAC) might correct phantom fan alarms but will not fix an actual mechanical failure, and it should be attempted only after hardware is ruled out. Reseating heat sinks addresses poor thermal transfer, not loss of airflow, and relocating the chassis does nothing for a stopped fan. Therefore, replacing the fan and verifying normal operation is the most appropriate next step.