On Thu, 2009-05-21 at 15:13 +0200, Peter Hopfgartner wrote:
Dear ML
We upgraded a Dell Poweredge PE 1950 Server the 8th of May. Since then the server rebooted 3 times without external cause (it is located in a server farm with redundant power supply etc.). Looking at the servers monitoring infrastructure with Dell's own OpenManage tools, I get strange errors:
[root@servernew ~]# omreport system esmlog
(....)
Severity : Critical Date and Time : Mon May 11 17:46:59 2009 Description : System Software event: run-time critical stop was asserted
Severity : Critical Date and Time : Fri May 15 21:07:57 2009 Description : System Software event: run-time critical stop was asserted
Severity : Critical Date and Time : Wed May 20 21:00:53 2009 Description : System Software event: run-time critical stop was asserted
(...)
This class of errors never happened before in over a year that the server is running.
There is no mention of any anomaly, except the boot messages itself, in /var/log/messages.
The server runs the 64 bit flavor of CentOS hosting some XEN virtual machines and some PostgreSQL and MySQL databases. It run without any issues with CentOS 5.1 and 5.2.
I interpreted these issues as some kernel/software related problem, but do not know how to make a more accurate diagnosis of the problem.
Can anybody give me some hint? Has anybody had some similar issue?
Hmm... you *definitely* want to take this one to the Dell Linux list. Having said that, I did some googling for:
omreport run-time critical stop was asserted
and found only one hit for someone that faced it in April 2007. And Dell told them that it may have been software. I'd start there. Some additional questions: What version of CentOS? What kernel version? What version of the Dell tools?
-I