During a scheduled maintenance window, a junior administrator upgraded the processors in a dual-socket rack server. The system powers on and completes POST, but whenever CPU utilization rises above roughly 40 % the host shuts itself off after just a few minutes. The BMC system-event log shows repeated entries such as "CPU0 Thermal Trip" and "Critical temperature threshold exceeded." All chassis fans report nominal RPMs, and the room temperature is a steady 23 °C. Which of the following technical issues is the MOST likely cause of these unexpected shutdowns?
An improperly seated heat sink on the newly installed CPU
A backplane failure in the internal drive cage
A power-supply fault in the redundant PSU pair
Firmware incompatibility between the system BIOS and the BMC
A thermal trip event is generated when a processor's on-die protection circuit detects a dangerous temperature and forcibly powers the server off. Vendor troubleshooting guides state that this nearly always results from a failure in the cooling solution and list "verify heatsink is properly attached and has thermal grease" as the first remediation step. If the heatsink was disturbed or not re-seated with fresh thermal compound during the CPU swap, there will be poor contact between the CPU lid and the cooler, causing rapid overheating even though the fans are at full speed and ambient temperature is normal.
The other choices do not match the symptoms:
A power-supply fault would log voltage or power errors, not thermal-trip events.
Firmware mismatches can cause POST or management errors but would not repeatedly cause overheating only under load.
A backplane failure affects storage, not CPU temperature.
Therefore, an improperly seated heat sink on the new CPU is the most probable root cause.