[CentOS] Design changes are done in Fedora

Tue Dec 30 05:07:37 UTC 2014
Les Mikesell <lesmikesell at gmail.com>

On Mon, Dec 29, 2014 at 8:04 PM, Warren Young <wyml at etr-usa.com> wrote:
> >>>
>>> the world where you design, build, and deploy The System is disappearing fast.
>>
>> Sure, if you don't care if you lose data, you can skip those steps.
>
> How did you jump from incremental feature roll-outs to data loss?  There is no necessary connection there.

No, it's not necessary for either code interfaces or data structures
to change in backward-incompatible ways.  But the people who push one
kind of change aren't likely to care about the other either.

> In fact, I’d say you have a bigger risk of data loss when moving between two systems released years apart than two systems released a month apart.  That’s a huge software market in its own right: legacy data conversion.

I'm not really arguing about the timing of changes, I'm concerned
about the cost of unnecessary user interface changes, code interface
breakage, and data incompatibility, regardless of when it happens.
RHEL's reason for existence is that it mostly shields users from that
within a major release.  That doesn't make it better when it happens
when you are forced to move to the next one.

> If your software is DBMS-backed and a new feature changes the schema, you can use one of the many available systems for managing schema versions.  Or, roll your own; it isn’t hard.

Are you offering to do it for free?

> You test before rolling something to production, and you run backups so that if all else fails, you can roll back to the prior version.

That's fine if you have one machine and can afford to shut down while
you make something work.   Most businesses aren't like that.

> None of this is revolutionary.  It’s just what you do, every day.

And it is time consuming and expensive.

>> when it breaks it's not the developer answering
>> the phones if anyone answers at all.
>
> Tech support calls shouldn’t go straight to the developers under any development model, short of sole proprietorship, and not even then, if you can get away with it.  There needs to be at least one layer of buffering in there: train up the secretary to some basic level of cluefulness, do everything via email, or even hire some dedicated support staff.
>
> It simply costs too much to break a developer out of flow to allow a customer to ring a bell on a developer’s desk at will.

Beg your pardon?   How about not breaking the things that trigger the
calls in the first place - or taking some responsibility for it.  Do
you think other people have nothing better to do?

> Since we’re contrasting with waterfall development processes that may last many years, but not decades, I’d say the error has already been made if you’re still working with a waterfall-based methodology today.
>

We never change more than half of a load-balenced set of servers at
once.  So all changes have to be compatible when running concurrently,
or worth rolling out a whole replacement farm.

>> some stuff can't be.
>
> Very little software must be developed in waterfall fashion.

If you run continuous services you either have to be able to run
new/old concurrently or completely duplicate your server farm as you
roll out incompatible clients.

> Last time I checked, this sort of software only accounted for about ~5% of all software produced, and that fraction is likely dropping, with the moves toward cloud services, open source software, subscription software, and subsidized software.
>
> The vast majority of software developed is in-house stuff, where the developers and the users *can* enter into an agile delivery cycle.

OK, but they have to not break existing interfaces when they do that.
 And that's not the case with OS upgrades.

>> If you are, say, adding up dollars, how many times do you want that
>> functionality to change?
>
> I’m not sure what you’re asking.

I'm asking if computer science has advanced to the point where adding
up a total needs new functionality, or if you would like the same
total for the same numbers that you would have gotten last year.   Or
more to the point, if the same program ran correctly last year,
wouldn't it be nice if it still ran the same way this year, in spite
of the OS upgrade you need to do because of the security bugs that
keep getting shipped while developers spend their time making
arbitrary changes to user interfaces.

> Compare a rolling release model like that of Cygwin or Ubuntu (not LTS).  Something might break every few months, which sounds bad until you consider that the alternative is for *everything* to break at the same time, every 3-7 years.

When your system requires extensive testing, the few times it breaks
the better.  Never would be nice...


>>> I don’t mean that glibly.  I mean you have made a fundamental mistake if your system breaks badly enough due to an OS change that you can’t fix it within an iteration or two of your normal development process.  The most likely mistake is staffing your team entirely with people who have never been through a platform shift before.
>>
>> Please quantify that.  How much should a business expect to spend per
>> person to re-train their operations staff to keep their systems
>> working across a required OS update?  Not to add functionality.  To
>> keep something that was working running the way it was?
>
> If you hire competent people, you pay zero extra to do this, because this is the job they have been hired to do.

That's nonsense for any complex system.   There are always _many_
different OS versions in play and many different development groups
that only understand a subset, and every new change they need to know
about costs time and risks mistakes.

> That's pretty much what IT/custom development is: coping with churn.

And it is expensive.  Unnecessarily so, in my opinion.

>> How many customers for your service did you keep running non-stop
>> across those transitions?
>
> Most of our customers are K-12 schools, so we’re not talking about a 24/7 system to begin with.  K-12 runs maybe 9 hours a day (7am - 4pm), 5 days a week, 9 months out of the year.  That gives us many upgrade windows.

That's a very different scenario than a farm of data servers that have
to be available 24/7.

> We rarely change out hardware or the OS at a particular site.  We generally run it until it falls over, dead.
>
> This means we’re still building binaries for EL3.

I have a few of those, but I don't believe that is a sane thing to recommend.

> This also means our software must *remain* broadly portable.  When we talk about porting to EL7, we don’t mean that it stops working on EL6 and earlier.  We might have some graceful feature degradation where the older OS simply can’t do something the newer one can, but we don’t just chop off an old OS because a new one came out.
>

You'd probably be better off in java if you aren't already.

>>> Everyone’s moaning about systemd...at least it’s looking to be a real de facto standard going forward.
>>
>> What you expect to pay to re-train operations staff -just- for this
>> change, -just- to keep things working the same..
>
> You ask that as if you think you have a no-cost option in the question of how to address the churn.

I ask it as if I think that software developers could make changes
without breaking existing interfaces.   And yes, I do think they could
if they cared about anyone who built on those interfaces.

>> We've got lots of stuff that will drop into Windows server versions
>> spanning well over a 10 year range.
>
> Yes, well, Linux has always had a problem with ABI stability.  Apparently the industry doesn’t really care about this, evidenced by the fizzling of LSB, and the current attacks on the work at freedesktop.org.  Apparently we’d all rather be fractious than learn to get along well enough that we can nail down some real standards.

Well, that has done a great job of keeping Microsoft in business.

> I’ve never done much with Windows Server, but my sense is that they have plenty of churn over in their world, too.  We’ve got SELinux and SystemD, they’ve got UAC, SxS DLLs, API deprecation, and tools that shuffle positions on every release.  (Where did they move the IPv4 configuration dialog this time?!)
>
> We get worked up here about things like the loss of 32-bit support, but over in MS land, they get API-of-the-year.  JET, ODBC, OLE DB, or ADO?  Win32, .NET desktop, Silverlight, or Metro?  GDI, WinG, DirectX, Windows Forms or XAML?  On and on, and that’s just if you stay within the MSDN walls.

Yes, there are changes - and sometimes mysterious breakage.  But an
outright abandonment of an existing interface that breaks previously
working code s pretty rare (and I don't like it when they do it
either...).

>> Were you paying attention when Microsoft wanted to make XP obsolete?
>> There is a lot of it still running.
>
> Were you paying attention when Target’s XP-based POS terminals all got pwned?
>
> Stability and compatibility are not universal goods.

Well, some things you have to get right in the first place - and then
stability is good.

> Google already did that cost/benefit calculation: they tried staying on RH 7.1 indefinitely, and thereby built up 10 years of technical debt.  Then when they did jump, it was a major undertaking, though one they apparently felt was worth doing.

And conversely, they felt is was worth _not_ doing for a very very
long time.   So can the rest of us wait until we have google's
resources?

>> And why do you think it is a good thing
>> for this to be a hard problem or for every individual user to be
>> forced to solve it himself?
>
> I never said it was a good thing.  I’m just reporting some observations from the field.

Maybe I misunderstood - I thought you were defending the status quo -
and the fedora developers that bring it to us.

-- 
   Les Mikesell
     lesmikesell at gmail.com