A system administrator is investigating reports of data corruption on a critical database server. The corruption manifests as subtle, incorrect characters in various database records and does not align with specific user actions or application functions. While storage diagnostics show no disk failures, the server's management logs indicate numerous single-bit memory errors were corrected over the past month, but these corrections are no longer being reported. Which of the following is the MOST likely cause of this data corruption?
A zero-day exploit in the database application
Silent data corruption (bit rot) on the storage array
The correct answer is failing ECC memory. Error-Correcting Code (ECC) memory is designed to detect and correct single-bit errors in RAM. The logs showing a high number of corrected errors indicate that a memory module was degrading. The cessation of these logged corrections suggests the module has failed to a point where errors are now multi-bit and uncorrectable, or the ECC function itself has failed. This allows corrupted data to be passed from memory to the CPU and then written to the database, resulting in the type of subtle data corruption described. Silent data corruption on the storage array (bit rot) is less likely because the evidence specifically points to a memory issue. A zero-day exploit is unlikely to cause such random, subtle errors and is not supported by the log evidence. Filesystem journaling errors relate to inconsistencies after a crash, not the type of in-memory data corruption indicated by the ECC error logs.