On Mon, 14 Feb 2011, Nico Kadel-Garcia wrote:
Trust me, it's a pain in the keister in production. If the standard is now enabled, good: I haven't had my hands inside a server in a year, I admit it. (My current role doesn't call for it.) It *didn't* used to be standard. Are you sure it is?
I buy whole machines not bits, and it's all preconfigured. I can't speak for the defaults in random motherboards.
I'm still seeing notes that the motherboards thtat support it are still significantly more expensive, "server grade". Unfortunately, I've worked for a manufacturer that repackaged consumer grade components for cheap pizza box servers, and we had some disagreements about where they cut corners.
There's a difference between high quality motherboards and motherboards advertised as high quality. But yes, you'll pay a bit more for ECC than not, but then I'll be paying more for dual PSU, and IPMI as well. But since I then don't need a IP-KVM or a controllable PDU it's worth the relatively small amount it costs.
It's very awkward to preserve BIOS settings across BIOS updates (read: impossible without a manual checklist) unless your environment is so sophisticated you're using LinuxBIOS.
Dell BIOS updates do not affect the settings, so it's quite easy.
Unless you've *really* invested and gotten remote KVM boxes or invested in Dell's DRAC or HP's remote console tools, *and set them up correctly at install time, and kept their network setups up to date*, they're a nightmare to do remotely with someone putting hands and eyes on the server. And the remote tools are *awful* at giving you BIOS access, often because the changes in screen resolution for different parts of the boot process confuse the remote console tools, at least if you use the standard VGA like access because you haven't set the console access because that *often requires someone to enable it from the BIOS*, which leads to a serious circular dependency.
Speaking for Dell here:
Generally speaking, get a machine that supports IPMI. A remote Serial-Over-LAN session can be initiated just nicely for editing bios settings if you need human driven remote BIOS tweaking. Same as you would if you were stood at it. If you have a Dell, syscfg lets you edit a large number of the BIOS settings from within linux, with an interface that doesn't vary between models. Also useful when you get a replacement motherboard / new machine as you can script it. That's all done through smbios as far as I know.
All the IPMI stuff is configurable either through IPMITool or OMSA. Through OMSA it's identical across at least the last 3 generations of servers, and nigh on identical through IPMITool.
Now scale by a stack of slightly different models of servers with diferent interfaces for their BIOS management, and you have a mess to manage. I *LOVE* environments where the admins have been able to insist on, or install, LinuxBIOS because this is *solved* there. You can get at it from Linux userland as necessary, they reboot *much* faster, and you can download and backup the configurations for system reporting. It's my friend.
Standardisation is great, so yes, I'd love something like LinuxBIOS across the board. But without something like this, it's still something you can cope with.
Dells are solid, server class machines. I've seen HP oversold with a lot of promises about management tools that don't work that well, for tasks better integrated and managed by userland tools that *have to be done anyway*, and sold with a lot of genuinely unnecessary features. (Whose bright idea was it to switch servers to laptop hard drives? E-e-e-e-e-w-w-w-w-w!!!"
I hope this isn't a general dig at 2.5" disks?
ECC has a point, which I've acknowledged. But the overall "server class" hardware costs add up fast. SAS hard drives, 10Gig ethernet ports, dual power supplies, built-in remote KVM, expensive racking hardware, 15,000 RPM drives instead of 10,000 RPM, SAS instead of SATA, etc. all start adding up really fast when all you need is a so-called "pizza box".
But you *are* adding on lots of extras there that don't come pre-bundled with ECC. Hey, my *desktop* has ECC memory...
This is one reason I've gotten fond of virtualization. (VMWare or VirtualBox for CentOS 5, we'll see about KVM for RHEL and CentOS 6). Amortizing the costs of a stack of modest servers with such server class features across one central, overpowered server and doling out environments as necessary is very efficient and avoids a lot of the hardware management problems.
Sure.
It's the overall "enterprise class hardware" meme that I'm concerned about for a one-off CentOS grade server.
CentOS grade?
Are you sure it was fixed by memory replacement? Because I've seen most of my ECC reports as one-offs, never to recur again.
Yes. Reset the counters, retripped the warning. Moved the DIMM, problem followed the DIMM. Replaced the DIMM, all well again.
Equally I've had file servers do the same. Running a file server without ECC is a recipe for disaster, as you're risking silent data corruption.
Core file servers, I'd agree, although a lot of the more common problems (such as single very expensive fileserver failure and lack of user available snapshots) are ameliorated by other approaches. (Multiple cheap SATA external hard drives for snapshot backups, NFS access so the users can recover personally deleted files, single points of failure in upstream connectivity, etc.)
Yes there are other requirements other than just sound hardware, but that doesn't mean sound hardware isn't a good starting point.
jh