[CentOS-devel] Upgrading postgresql (Re: CentOS/RHEL 5.3)

Thu Jan 22 17:43:42 UTC 2009
Lamar Owen <lowen at pari.edu>

> Charlie Brady wrote:
>On Fri, 23 Jan 2009, John Summerfield wrote:

>> And then there's postgresql. One has to backup one's data before
>> upgrading major postgresql releases and then restore into the new.

> I consider that a major upstream bug.

Upstream collectively disagrees with you.  Even though a PostgreSQL core developer, Tom Lane, works for Red Hat and packages the RHEL PostgreSQL packages.  Upgrading PostgreSQL is a HARD thing to do without using the documented 'dump-upgrade-initdb-restore' sequence (unless you want to get multiversion installs working, and use Slony to do the migration...good luck with the multiversion! Although Debian has that piece worked out, Debian can do things during install/upgrade in .deb packages that RPM's cannot do).

>However, at the least a %pre script should create an SQL dump before 
>upgrading major releases, so user is not left with an unusable blob.

>Better would be for postgresql to ship a standalone SQL dumper, which can 
>read old file formats.

I maintained the 'PGDG' or community upstream RPM's for PostgreSQL for five years, from 1999 to 2004.  Personal reasons caused me to hand that over to Devrim, the current RPM maintainer lead.  So I've fought with this issue a long time (as John knows).
A %pre scriptlet has no way of reliably detecting whether it is running under Anaconda during a media-fed upgrade or from a fully installed system.  If the %pre scriptlet is running under Anaconda, it cannot do an SQL dump, which requires a major portion of the normal system to be present and running to complete. This is not the case in an Anaconda-mediated upgrade, during which many basic system services are simply not there. 
Then there is the disk space issue (making sure you don't run out).  Oh, and compiled C functions.
There was once a pg_upgrade program that could do some of this stuff; however, it can break in subtle ways.  The PostgreSQL system catalogs that are part of the database contain things that are part of the core system, including functions, operators, and the like.  And sometimes the actual tuple format changes from one release to another...
Now, it has been a while since I last looked at the upgrading situation; if you want to learn more about it, read the archives of the pgsql-hackers list and search for the various and many upgrade discussions.
As a user, I have a CentOS 4 system that at this point in time cannot be upgraded past its current PostgreSQL version due to the need to store photographs from Microsoft Access as large objects.  The support for the method used is not in any subsequent version, and in fact doesn't work on anything but the version shipped with CentOS 4.  We will have to recode the application to get it ported, unfortunately.  Any automatic upgrade of this system would break, and break badly.
The moral is that, as a server administrator, you must be ever diligent to make sure that 'upgraded' software doesn't dramatically break things; and you cannot rely on the OS upgrade to do it.  (I'm thinking BIND, Apache modules, java versions, amavisd and sendmail, among other things that tend to break in upgrades....)
Lamar Owen
Chief Information Officer
Pisgah Astronomical Research Institute
1 PARI Drive
Rosman, NC 28772