[CentOS] Just need to vent

Tue Jan 26 18:52:56 UTC 2016
Warren Young <wyml at etr-usa.com>

On Jan 26, 2016, at 8:20 AM, Sylvain CANOINE <sylvain.canoine at tv5monde.org> wrote:
>> * track process lifecycle, and restart (or take other action) on
>> failure. (If software were perfect, this wouldn't be needed, but as
>> is, this can save you being paged in the middle of the night.)
> No. A software which falls down is buggy, and needs to be fixed. Period.

What makes you think that having a supervisor makes you ignore problems?

First off, a restarted daemon process is likely going to lose some kind of runtime state.  Logged in users will be bounced out, work may be lost, etc.  You will get calls about this.

Second, it’s not like systemd invented this idea and is now trying to convince the rest of the world that it’s a good one.  systemd was preceded by launchd, xinetd, inetd, Erlang supervisors…  My company has prior art on that, for that matter, and I assure you, we don’t just ignore spontaneously restarting servers.

All a supervisor does is replace human working time — “service badboy restart” — with a tiny slice of computer time, reducing the impact of the downed service.  If the daemon doesn’t immediately fail again, the total downtime charged against that daemon might be a fraction of a second.

Yes, you still have to fix the underlying problem.  But are you saying it would be better if your users were completely shut out while emails made their way through the tech support loop?

> will just make GNU/linux more unstable.

Then explain why Ericsson’s Erlang-based AXD301 telephone switching system achieved *nine* nines of uptime, in large part due to a supervisory process restarting framework:


If the supervisors were just restarting frequently-dying processes, don’t you think all the Ericsson based telephone systems in the world would be noticeably less reliable than, say, the AT&T ones?

>> * actually securely connect output to the process it came from for
>> logging -- both stdout/stderr and actual log messages. (This is why
>> journald is closely integrated.)
> Driving sysadmins unable to read logs just because the file is corrupted, or to send logs to a dedicated server, is a real security improvement, indeed.

Well, in fact, mirroring logs to a trapdoor log server *is* a good security practice.  It makes auditing a pwned system much less uncertain.

I’m no huge systemd defender.  It reeks of second system effects, god modules, and other antipatterns of software design.  However, this is not the place to fix it or replace it.

I’m here because I have no intention of fixing it or replacing it.

Archimedes said he could move the world given a long enough lever and a firm place to stand.  Maybe you think this mailing list is your lever of change, but you’ve forgotten that you also need a firm place to stand.  All the force you put on that level will just push you out of position, rather than move your target.

You need a firmer place to stand.