[CentOS] Design changes are done in Fedora

On Dec 29, 2014, at 4:03 PM, Les Mikesell <lesmikesell at gmail.com> wrote:

> On Mon, Dec 29, 2014 at 3:03 PM, Warren Young <wyml at etr-usa.com> wrote:
>> 
>> the world where you design, build, and deploy The System is disappearing fast.
> 
> Sure, if you don't care if you lose data, you can skip those steps.

How did you jump from incremental feature roll-outs to data loss?  There is no necessary connection there.

In fact, I’d say you have a bigger risk of data loss when moving between two systems released years apart than two systems released a month apart.  That’s a huge software market in its own right: legacy data conversion.

If your software is DBMS-backed and a new feature changes the schema, you can use one of the many available systems for managing schema versions.  Or, roll your own; it isn’t hard.

You test before rolling something to production, and you run backups so that if all else fails, you can roll back to the prior version.

None of this is revolutionary.  It’s just what you do, every day.

> when it breaks it's not the developer answering
> the phones if anyone answers at all.

Tech support calls shouldn’t go straight to the developers under any development model, short of sole proprietorship, and not even then, if you can get away with it.  There needs to be at least one layer of buffering in there: train up the secretary to some basic level of cluefulness, do everything via email, or even hire some dedicated support staff.

It simply costs too much to break a developer out of flow to allow a customer to ring a bell on a developer’s desk at will.

>> The world is moving toward incrementalism, where the first version of The System is the smallest thing that can possibly do anyone any good.  That is deployed ASAP, and is then built up incrementally over years.
> 
> That works if it was designed for rolling updates.  Most stuff isn’t,

Since we’re contrasting with waterfall development processes that may last many years, but not decades, I’d say the error has already been made if you’re still working with a waterfall-based methodology today.

The first strong cases for agile development processes were first made about 15 years ago, so anything started 7 years ago (to use the OP’s example) was already disregarding a shift a full software generation old.

> some stuff can't be.

Very little software must be developed in waterfall fashion.

Avionics systems and nuclear power plant control systems, for example.  Such systems make up a tiny fraction of all software produced.

A lot of commercial direct-to-consumer software also cannot be delivered incrementally, but only because the alternative messes with the upgrade treadmill business model.

Last time I checked, this sort of software only accounted for about ~5% of all software produced, and that fraction is likely dropping, with the moves toward cloud services, open source software, subscription software, and subsidized software.

The vast majority of software developed is in-house stuff, where the developers and the users *can* enter into an agile delivery cycle.

>> Instead of trying to go from 0 to 100 over the course of ~7 years, you deliver new functionality to production every 1-4 weeks, achieving 100% of the desired feature set over the course of years.
> 
> If you are, say, adding up dollars, how many times do you want that
> functionality to change?

I’m not sure what you’re asking.

If you’re talking about a custom accounting system, the GAAP rules change several times a year in the US:

   http://www.fasb.org/jsp/FASB/Page/SectionPage&cid=1176156316498

The last formal standard put out by FASB was 2009, and they’re working on another version all the time.  Chances are good that if you start a new 7-year project, a new standard will be out before you finish.

If instead you’re talking about the cumulative cost of incremental change, it shouldn’t be much different than the cost of a single big-bang change covering the same period.

In fact, I’d bet the incremental changes are easier to adopt, since each change can be learned piecemeal.  A lot of what people are crying about with EL7 comes down to the fact that Red Hat is basically doing waterfall development: many years of cumulative change gets dumped on our HDDs in one big lump.

Compare a rolling release model like that of Cygwin or Ubuntu (not LTS).  Something might break every few months, which sounds bad until you consider that the alternative is for *everything* to break at the same time, every 3-7 years.

I’m not arguing for CentOS/RHEL to turn into Ubuntu Desktop.  I’m just saying that there is a cost for stability: every 3-7 years, you must hack your way through a big-bang change bolus.

(6-7 years being for those organizations that skip every other major release by taking advantage of the way the EL versions overlap.  EL5 was still sunsetting as EL7 was rising.)

>> This isn’t pie-in-the-sky theoretical BS.  This is the way I’ve been developing software for decades, as have a great many others.  Waterfall is dead, hallelujah!
> 
> How many people do you have answering the phone about the wild and
> crazy changes you are introducing weekly?

The burden of tech support has more to do with proper QA and roll-out strategies than with the frequency of updates.

For the most part, we roll new code to a site in response to a support call, rather than field calls in response to an update.  The new version solves their problem, and we don’t hear back from them for months or years.

We don’t update all sites to every new release.  We merely ship *a* new release every 1-4 weeks, which goes out to whoever needs the new features and fixes.  It’s also what goes out on each new server we ship.

> How much does it cost to train them?

Most of our sites get only one training session, shortly after the new system is first set up.

We rarely get asked to do any follow-up training.  The users typically pick up on the incremental feature updates as they happen, without any additional help from us.  We attribute that to solid UX design.

That first session is mostly about giving the new users an idea of what the system can do.  We teach them enough to teach themselves.

How often do most people get trained to use a word processor?  I’ll bet a lot of people got trained just once, in grade school.  They just cope with changes as they come.

The worst changes are when you skip many versions.  Word 97 to Word 2007, for example. *shudder*

>> I don’t mean that glibly.  I mean you have made a fundamental mistake if your system breaks badly enough due to an OS change that you can’t fix it within an iteration or two of your normal development process.  The most likely mistake is staffing your team entirely with people who have never been through a platform shift before.
> 
> Please quantify that.  How much should a business expect to spend per
> person to re-train their operations staff to keep their systems
> working across a required OS update?  Not to add functionality.  To
> keep something that was working running the way it was?

If you hire competent people, you pay zero extra to do this, because this is the job they have been hired to do.

That's pretty much what IT/custom development is: coping with churn.

Most everything you do on a daily basis is a reaction to some change external to the IT/development organization:

- Capacity increases

- Obsolete ‘ware upgrades

- New seat/site deployments

- Failed equipment replacements

- Compatibility breakage repair (superseded de facto standard, old de jure standard replaced, old proprietary item no longer available…)

- Tracking business rule change (GAAP, regulations, mergers…)

- Effecting business change (entering new markets, automation, solving new problems developing from new situations…)

- Tracking business strategy change (new CEO, market shift…)

Setting aside retail software development, IT and internal development organizations *should* be chasing this kind of thing, not being “proactive.”  We’re not trying to surprise our users with things they didn’t even ask for, we’re trying to solve their problems.

Maybe we solve problems in a *manner* our users did not expect — hopefully a better way — but we’re rarely trying to innovate, as such.

> how much developer time would you expect to spend to
> follow the changes and perhaps eventually make something work better?

Pretty much 100%, after subtracting overhead.  (Meetings, email, breaks, reading…)

Again: This is what we do.  Some new thing happens in the world, and we go out and solve the resulting problems.

The only question is one of velocity: the more staff you add, the faster you go.  So, how fast do you want to go?

(Yes, I’ve read “The Mythical Man Month.”  The truths within that fine book don’t change the fact that Microsoft can develop a new OS faster than I can all by my lonesome.)

>> The software system I’ve been working on for the past 2 decades has been through several of these platform changes.
> 
> How many customers for your service did you keep running non-stop
> across those transitions?

Most of our customers are K-12 schools, so we’re not talking about a 24/7 system to begin with.  K-12 runs maybe 9 hours a day (7am - 4pm), 5 days a week, 9 months out of the year.  That gives us many upgrade windows.

We rarely change out hardware or the OS at a particular site.  We generally run it until it falls over, dead.

This means we’re still building binaries for EL3.

This also means our software must *remain* broadly portable.  When we talk about porting to EL7, we don’t mean that it stops working on EL6 and earlier.  We might have some graceful feature degradation where the older OS simply can’t do something the newer one can, but we don’t just chop off an old OS because a new one came out.

All that having been said, we do occasionally roll a change to a site, live. We can usually do it in such a way that the site users never even notice the change, except for the changed behavior.

This is not remarkable.  It’s one of the benefits you get from modern centralized software development and deployment stacks.

>> Everyone’s moaning about systemd...at least it’s looking to be a real de facto standard going forward.
> 
> What you expect to pay to re-train operations staff -just- for this
> change, -just- to keep things working the same..

You ask that as if you think you have a no-cost option in the question of how to address the churn.

Your only choices are:

1. Don’t upgrade

2. Upgrade and cope

3. Switch to something else

Each path carries a cost.

You think path 1 is free?  If you skip EL7, you’re just batching up the changes.  You’ll pay eventually, when you finally adopt a new platform.  One change set plus one change set equals about 1.9 change sets, plus compound penalties.

Penalties?  Yes.

You know the old joke about how you eat an elephant? [*]  By the time you eat 1.9 elephants, you’ve probably built up another ~0.3 change sets worth of new problems.  Time you spend grinding through nearly two full change sets is time you don’t spend keeping your current backlog short.

We call this technical debt in the software development world.  It’s fine to take out a bit of technical debt occasionally, as long as you don’t let it build up too long.  The longer you let it build, the more the interest & penalties accrue, so the harder it is to pay down.

> We've got lots of stuff that will drop into Windows server versions
> spanning well over a 10 year range.

Yes, well, Linux has always had a problem with ABI stability.  Apparently the industry doesn’t really care about this, evidenced by the fizzling of LSB, and the current attacks on the work at freedesktop.org.  Apparently we’d all rather be fractious than learn to get along well enough that we can nail down some real standards.

Once again, though, there’s a fine distinction between stable and moribund.

> And operators that don't have a
> lot of special training on the differences between them.

I’ve never done much with Windows Server, but my sense is that they have plenty of churn over in their world, too.  We’ve got SELinux and SystemD, they’ve got UAC, SxS DLLs, API deprecation, and tools that shuffle positions on every release.  (Where did they move the IPv4 configuration dialog this time?!)

We get worked up here about things like the loss of 32-bit support, but over in MS land, they get API-of-the-year.  JET, ODBC, OLE DB, or ADO?  Win32, .NET desktop, Silverlight, or Metro?  GDI, WinG, DirectX, Windows Forms or XAML?  On and on, and that’s just if you stay within the MSDN walls.

>> Could it be that software for these other platforms *also* manages to ride through major breaking changes?
> 
> Were you paying attention when Microsoft wanted to make XP obsolete?
> There is a lot of it still running.

Were you paying attention when Target’s XP-based POS terminals all got pwned?

Stability and compatibility are not universal goods.

>>> What enterprise can afford to rewrite all of its software
>>> every ten years?
>> 
>> Straw man.
> 
> Not really.  Ask the IRS what platform they use.   And estimate what
> it is going to cost us when they change.

Monopolies are inherently inefficient and plodding.  Government is special only because it is the biggest monopoly.

(That’s why we have antitrust law: not because it’s good for the consumer, but because it fights the trend toward zaibatsu rule.)

Few organizations are working under such stringent constraints, if only because it’s a danger to the health of the organization.  Only monopolies can get away with it.

>> (The long dragging life of XP is an exception.  Don’t expect it to occur ever again.)
> 
> No, that is the way things work.   And the reason Microsoft is in business.

Microsoft stopped retail sale of Windows 7 a few months ago, and Vista back in April.

A few months ago, there was a big stink when MS killed off Windows 8.0 updates, requiring that everyone upgrade to 8.1.

Yes, I know about downgrade rights for pro versions of Windows.

Nevertheless, the writing is on the wall.

>> while your resources aren’t as extensive as Google’s, your problem isn’t nearly as big as Google’s, either.
> 
> So again, quantify that.  How much should it cost a business _just_ to
> keep working the same way?

Google already did that cost/benefit calculation: they tried staying on RH 7.1 indefinitely, and thereby built up 10 years of technical debt.  Then when they did jump, it was a major undertaking, though one they apparently felt was worth doing.

There’s a cost to staying put, too.

> And why do you think it is a good thing
> for this to be a hard problem or for every individual user to be
> forced to solve it himself?

I never said it was a good thing.  I’m just reporting some observations from the field.

—————

[*] One bite at a time.