Tuesday, May 26, 2015

RHEL - system crashed due to memory issue



Server had memory issue and /var fs is full.
/dev/mapper/VolGroup00-volvar   100% /var

Please expand on what you mean by memory issues?

It appears the server suffered a crash and the dump file is generated what caused the /var to fill up.  It appears the creation of the crash dump also caused the memory errors.

=========================================================

mcelog logs hardware related errors on Linux based x86 systems. Mostly this tool is used on physical server and start at boot time (used to be at cron) and runs as a daemon. it can detech hardware error such as system bus errors, CPU error (cache error on processor or hardware) and most importantly memory error (Error Correction code-ECC). Once it detech the error threshold, it can predictively offline memory pages and CPUs  based on the error. If you check the error frequently, you will find the problem before server panic and crash.


Install mcelog
# yum install mcelog

Verify the daemon is running
# mcelog --client
# /etc/init.d/mcelog status
#  service mcelogd status

Dependencies
- Make sure  /dev/mcelog does exists. If not create with mknod command
# mknod /dev/mcelog c 10 227


How to find the error?
- Login to console and run the meclog command which read message from the kernel. Make sure to send output to a file because you can't re-run it see the error.
# /usr/sbin/mcelog >/var/tmp/mymce.log

Check the log
# more /var/log/mcelog
# grep -i "hardware error" /var/log/mcelog
# more /var/log/mcelog
# tail -200 /var/log/mcelog

Put it on cron,
[ $(grep -c "hardware error" /var/log/mcelog) -gt 0 ] && echo "Hardware Error on $(hostname)" | mailx -s "Error on `hostname`" sam@domain.com




Most of the systems are by default set up to dump the log at /var/log/mcelog.

Some commands
mcelog
mcelog --k8
mcelog --k8 --ascii
mcelog --k8 /dev/mcelog
mcelog --ascii /dev/mcelog
mcelog --ascii > changelog.txt
dmesg | grep ADMA
dmesg | grep ata5
vmstat -d


Note: If mcelog running as a daemon, you get the /dev/mcelog output when the MCE actually happens..

More on http://www.mcelog.org/

No comments:

Post a Comment