In answer to my own question, this is what I have done which appears to fix the problem:
1. Edit /usr/lib/rpm/macros and change the line %__dbi_rebuild nofsync !log !txn !cdb to %__dbi_rebuild nofsync !log !txn !cdb !thread 2. export LD_ASSUME_KERNEL=2.2.5 3. rpm --rebuilddb 4. unset LD_ASSUME_KERNEL 5. Edit /usr/lib/rpm/macros back to the original 6. rpm --rebuilddb
So far this fixes the problems I have been having. I assume it is possible to override the macro setting on the rpm command line so no editing is required and a simple script can be run on all my machines.
John.
John Newbigin wrote:
I have recently deployed a number of CentOS-3.4 boxes and I am seeing problems with rpm database apparent corruption. db4 errors like DB_PAGE_NOTFOUND.
I have found that using LD_ASSUME_KERNEL=2.2.5 seems to fix the problem but I can't find much info on why or if doing that is good or bad.
I have done --rebuilddb but with the LD_ASSUME_KERNEL that might be making thinks worse... I just don't know.
Does anyone here know anything about this problem?
John.
On Thu, 2005-04-14 at 09:50 +1000, John Newbigin wrote:
In answer to my own question, this is what I have done which appears to fix the problem:
- Edit /usr/lib/rpm/macros and change the line
%__dbi_rebuild nofsync !log !txn !cdb to %__dbi_rebuild nofsync !log !txn !cdb !thread 2. export LD_ASSUME_KERNEL=2.2.5 3. rpm --rebuilddb 4. unset LD_ASSUME_KERNEL 5. Edit /usr/lib/rpm/macros back to the original 6. rpm --rebuilddb
So far this fixes the problems I have been having. I assume it is possible to override the macro setting on the rpm command line so no editing is required and a simple script can be run on all my machines.
I use the --define switch to define rpm variables with rpmbuild ... don't know if it will work with RPM too, but it might. Here is an example:
--define "_build i386-redhat-linux-gnu"
My final script for anyone who might need it is:
#!/bin/bash unset LANG export LD_ASSUME_KERNEL=2.2.5 rpm -qa | wc --lines rm -rf /var/lib/rpm/__db.00? rpm --define '__dbi_rebuild nofsync !log !txn !cdb !thread' --rebuilddb unset LD_ASSUME_KERNEL rpm -qa | wc --lines rpm --rebuilddb rpm -qa | wc --lines
Note: Watch out for the ! characters.
John.
Johnny Hughes wrote:
On Thu, 2005-04-14 at 09:50 +1000, John Newbigin wrote:
In answer to my own question, this is what I have done which appears to fix the problem:
- Edit /usr/lib/rpm/macros and change the line
%__dbi_rebuild nofsync !log !txn !cdb to %__dbi_rebuild nofsync !log !txn !cdb !thread 2. export LD_ASSUME_KERNEL=2.2.5 3. rpm --rebuilddb 4. unset LD_ASSUME_KERNEL 5. Edit /usr/lib/rpm/macros back to the original 6. rpm --rebuilddb
So far this fixes the problems I have been having. I assume it is possible to override the macro setting on the rpm command line so no editing is required and a simple script can be run on all my machines.
I use the --define switch to define rpm variables with rpmbuild ... don't know if it will work with RPM too, but it might. Here is an example:
--define "_build i386-redhat-linux-gnu"
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On 4/13/05, John Newbigin jnewbigin@ict.swin.edu.au wrote:
My final script for anyone who might need it is:
#!/bin/bash unset LANG export LD_ASSUME_KERNEL=2.2.5 rpm -qa | wc --lines rm -rf /var/lib/rpm/__db.00? rpm --define '__dbi_rebuild nofsync !log !txn !cdb !thread' --rebuilddb unset LD_ASSUME_KERNEL rpm -qa | wc --lines rpm --rebuilddb rpm -qa | wc --lines
Note: Watch out for the ! characters.
John,
What are you doing that causes the corruption? Can you recreate. I have test harness for rpm and I have ran thousands perhaps tens of thousands of rpm transactions with it and have not seen this corruptions. I am reading you right that your turning off nptl by doing the LD_ASSUME_KERNEL stuff and turning off threads in use in rpm when doing the rebuild?
I do know of a deadlock issue with scriptlets do an incorrect use of pthread_cond_*:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=146549
The patch I attached to the bugzilla report fixes this.
Just so you know I am the maniac that wrote the autorollback patch for rpm:
http://lee.k12.nc.us/~joden/misc/patches/rpm/
And beyond caring about making rpm be able to provide rollback mechanisms that make possible a reliable rollback of an upgrade (I say possible because there is only so much you do about what people do in their scriptlets), I care about anything that causes rpm to be unreliable and unstable. So in short I am really interested in your problem.
Cheers...james
See my post to the rpm list for more details https://www.redhat.com/archives/rpm-list/2005-April/msg00032.html
The history of the machines is a bit more complex because they are all imaged over the network (via bbc-lnx and a custom tool called dart) and have had "rpm --rebuilddb" executed on them in a chroot while running the bbc kernel. The reason I --rebuilddb after imaging is that I have always had problems with the RPM database after imaging, as far back as RH7.2. The files are restored exactly the same, with the same parts of the files sparse and md5sums are the same, but RPM still would have errors. After a --rebuilddb they always came good.
Although all machines are images identically, not all machines had the original problem/lockup.
I can't remember exactly what package I was installing when I had the original problem, I think it was part of my qmail repo but that is made up of a number of packages...
I have tries to reproduce without success. I might have another go this afternoon and see what I can break.
John.
James Olin Oden wrote:
On 4/13/05, John Newbigin jnewbigin@ict.swin.edu.au wrote:
My final script for anyone who might need it is:
#!/bin/bash unset LANG export LD_ASSUME_KERNEL=2.2.5 rpm -qa | wc --lines rm -rf /var/lib/rpm/__db.00? rpm --define '__dbi_rebuild nofsync !log !txn !cdb !thread' --rebuilddb unset LD_ASSUME_KERNEL rpm -qa | wc --lines rpm --rebuilddb rpm -qa | wc --lines
Note: Watch out for the ! characters.
John,
What are you doing that causes the corruption? Can you recreate. I have test harness for rpm and I have ran thousands perhaps tens of thousands of rpm transactions with it and have not seen this corruptions. I am reading you right that your turning off nptl by doing the LD_ASSUME_KERNEL stuff and turning off threads in use in rpm when doing the rebuild?
I do know of a deadlock issue with scriptlets do an incorrect use of pthread_cond_*:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=146549
The patch I attached to the bugzilla report fixes this.
Just so you know I am the maniac that wrote the autorollback patch for rpm:
http://lee.k12.nc.us/~joden/misc/patches/rpm/
And beyond caring about making rpm be able to provide rollback mechanisms that make possible a reliable rollback of an upgrade (I say possible because there is only so much you do about what people do in their scriptlets), I care about anything that causes rpm to be unreliable and unstable. So in short I am really interested in your problem.
Cheers...james _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos