On 07/08/2014 01:27 PM, Les Mikesell wrote: > On Tue, Jul 8, 2014 at 11:25 AM, Lamar Owen <lowen at pari.edu> wrote: >> Memory tests are redundant with ECC. (I >> know; I have an older SuperMicro server here that passes memory testing >> in POST but throws nearly continuous ECC errors in operation; it does >> operate, though). If it fails during spinup, flag the failure while >> spinning up another server. > I don't think that is generally true. I've seen several IBM systems > disable memory during POST and come up running will a smaller amount. Yes, and I have a few Dells that do that as well. Unfortunately most OS's aren't 'hotplug/unplug' for RAM, which would alleviate the need to tag it out during POST. But perhaps some of today's and yesterday's hardware just isn't up to the task of reliable rapid power on. So perhaps I should have written 'Memory tests should be redundant with ECC.' > Our servers tend to just run till they die. If we didn't need them we > wouldn't have bought them in the first place. I suppose there are > businesses with different processes that come and go, but I'm not sure > that is desirable. Our load graphs here are very spurty, with the spurts going very high during certain image reduction processes. It is to the point where I could probably save money by putting a few of the more power hungry systems that have spurty loads on a timed sleep basis, which WoL bringing them back up prior to the next day's batch. But that's an ad hoc solution, and I really don't like ad hoc solutions when infrastructure ones are available and better tested. > If you need load balancing anyway you just run enough spares to cover > the failures. > And pay the power bill for them