Home > Support > HOWTO List > Quick-and-dirty memtest86

Unstable system? It could be the memory

Do you have a server that 'crashes' irregularly?

Some possible causes:

Bad memory is probably the biggest culprit behind those odd, random server restart errors.

Symptoms include random crashes.  Inexplicable program segfaults.  And (if you are supporting production systems and have the emergency pager) you will probably have had a few sleep interuppted nights.

Checking memory

The gold standard for checking memory is the memtest86 program.  Grab the ISO, burn it, reboot your system with it and wait 6 - 60 hours for a result.

Quick-and-dirty memtest86

If you need to get a quick idea if memory is at fault before taking a server off line for that long, try this:

#determine the amount of memory in your system
#cat /proc/meminfo
#MemTotal:      2068052 kB
memmb=$(cat /proc/meminfo | grep MemTotal | awk '{print $2}')
# use a 1050 count so that the created file will be 
# a bit bigger than the available memory (1024 or 
# maybe 1000 for the actual memory size)
dd if=/dev/urandom bs=$memmb of=large count=1050; md5sum large; md5sum large; md5sum large

dd will create an output file (called large) filled with random (/dev/urandom) bytes.  It will be bs*count big.  Then it will output a checksum (twice) for that file.

To output the checksum Linux will have to read the files.  It will cache all the data it can in memory.  Presuming you have less than 10GB of memory the server will use up all its memory during the file read.  And the md5sum will ensure that the same bits can be saved/read from memory consistently.

If the checksums do not match, you have faulty memory (guaranteed).

If the checksums match, try running the md5sum commands a couple more times.  If the checksums are consistently the same then you may or may not have faulty memory.  Run memtest86 to be sure.

The test can return the same checksum and still have bad memory since not all memory is addressable by the kernel.  And because the server will be running software that won't budge out of its currently position in memory (e.g. the kernel and probably most applications) so that memory won't be tested (running the test on a server with minimal applications running would therefore be a good way to improve its accuracy).