on 14:31 Wed 09 Mar, Michael Eager (eager@eagerm.com) wrote:
Dr. Ed Morbius wrote:
If the issue is repeated but rare system failures on one of a set of similarly configured hosts, I'd RMA the box and get a replacement. End of story.
I'll repeat: this is a house-made system. There's no vendor to RMA to. It seems obvious to me: RMA is not a diagnostic tool.
You fab your own silicon?
I saw your reference to a homebrew machine after I'd posted. You'd neglected to provide this information initially.
Knowing some basic stuff like: CPU architecture, memory allocation, disk subsystem, kernel modules, etc.,
If you'd post details of the host, more logging information, netconsole panic logs, etc., it might be possible to narrow down possible causes.
The problem is that there are NO DIAGNOSTICS generated when the system hangs. There's no panic and nothing in the logs which indicates any problem. This is what I indicated from the get go.
uname -a /proc/cpuinfo /proc/meminfo lspci lsmod /proc/mounts /proc/scsi/scsi /proc/partitions dmidecode
... would be useful for starters.
If you've built your own kernel, your config options (if you're running stock, we can get that from the package itself).
As would wiring up netconsole as I initially suggested.
If I can clarify: YOU are the person with the problem. WE are the people you're turning to for assistance. YOU are getting pissy. YOU should be focusing on providing relevant information, or noting that it's not available.
You're NOT obliged to repeat information you've already posted (e.g.: home-brew system), but it's helpful to front-load data rather than have us tease it out of you.
With what you've posted to date, it's not.
I could waste my time posting logs for you to tell me that they don't point to any problem. I'd rather skip that step.
Krell forfend you should post relevant and useful information which might be useful in actually diagnosing your problem (or pointing to likely candidates and/or further tests).