Not sure where to ask this, I've been googling and not finding much that is helpful.
At work I've got a multi-threaded program targeted at Linux. It compiles on RHEL 2.1 and 3, and is targeted at 2.1, 3, and 4. UP until today, the binary built on 3 has worked fine on 4.
But on RHEL4 update 4 it dies a horrible death in pthread_create. I have reproduced the problem on Centos 4.4, where I am tyring to debug it.
I thought it might be some newly-introduced binary incompatibility in the U4 update, so I recompiled it on Centos 4.4, and it does the same thing.
This code has been running for 5 years both in-house and at dozens of customers, on versions of RH linux ranging from 6.2, and 7.x, as well as RHEL 2.1 and 3, so I'm leaning toward something that is not a blatant bug in the code.
Anyway, on the either the 2nd or the 3rd (doesn't seem to be consistently one or the other) call to pthread_create we get a SIGSEGV, and looking at a stack backtrace the stack appears to be trashed, or at least the backtrace does not reflect reality. Single-stepping into the call doesn't get you anywhere except a sigsegv. I'm about to delve into it at the instruction level, but I must admit I'm not very knowledeable about 32-bit intel assembly language.
Needless to say I've read and re-read and re-re-read the man page for pthread_create(), and have been tweaking the calling sequence in subtle ways to see if I can change the behavior, but so far to no avail.
Has anyone here any knowledge of possible problems or incompatibilities in the NPTL implementation in 4.4?
Thanks!
fredex wrote:
Not sure where to ask this, I've been googling and not finding much that is helpful.
At work I've got a multi-threaded program targeted at Linux. It compiles on RHEL 2.1 and 3, and is targeted at 2.1, 3, and 4. UP until today, the binary built on 3 has worked fine on 4.
But on RHEL4 update 4 it dies a horrible death in pthread_create. I have reproduced the problem on Centos 4.4, where I am tyring to debug it.
This sounds the sort of thing that's likely to be outside the experties available here, particularly in any quantity.
Have you had a chat with your RHEL support contact? Even if they can't help directly, they might be able to call on someone or point you to a better forum, possibly one frequented by some folk who write/maintain gcc.
On Fri, Sep 08, 2006 at 09:28:31AM +0800, John Summerfield wrote:
fredex wrote:
Not sure where to ask this, I've been googling and not finding much that is helpful.
At work I've got a multi-threaded program targeted at Linux. It compiles on RHEL 2.1 and 3, and is targeted at 2.1, 3, and 4. UP until today, the binary built on 3 has worked fine on 4.
But on RHEL4 update 4 it dies a horrible death in pthread_create. I have reproduced the problem on Centos 4.4, where I am tyring to debug it.
This sounds the sort of thing that's likely to be outside the experties available here, particularly in any quantity.
Have you had a chat with your RHEL support contact? Even if they can't help directly, they might be able to call on someone or point you to a better forum, possibly one frequented by some folk who write/maintain gcc.
Thanks for the suggestion.
I haven't contacted RH yet because my RHEL4 box is currently down due to a hardware problem, and I know they won't talk to me if I say I can reproduce it on a Centos box! :)
fredex wrote:
I haven't contacted RH yet because my RHEL4 box is currently down due to a hardware problem, and I know they won't talk to me if I say I can reproduce it on a Centos box! :)
Stupid question, but is the box in question an i586 by any chance? There used to be issues with NPTL and i586 long time ago in FC2, which were mostly fixed in FC3. The fix for RHEL4 was simply not to support i586 architecture (CentOS project re-added it).
So if it is an old i586 box, Red Hat ain't gonna talk to you since RHEL4 is not supposed to run on an i586.
The problem with NPTL on i586 (in FC2) was that Red Hat provided only i386 and i686 versions of glibc package. NPTL can't be efficiently implemented on an i386 (it would be horribly slow). So i386 doesn't support it, and things were not working on i586 machines (kernel was supporting it, but glibc didn't). Many packages were broken because of that, most notably db3 (or db4, don't remember which one was shipped with FC2) and cyrus-imapd. Workaround in FC3 was i386 package that wasn't really i386-clean. It contained NPTL support using instructions that were available only on i486 and later processors (the glibc package should really have been labeled i486, not i386). In short, i386 version of glibc in FC3 can't run on real i386 processor (it needs i486 at minimum). However, since they provided only i586 and i686 kernels, it wasn't possible (or at least trivial) to install FC3 on real i386. And I really doubt anybody would ever attempt to install FC3 on real i386 box.
I believe (somebody correct me if I'm wrong) that CentOS 4 contains the same hack in i386 version of glibc (in order to support i586 processors). If you are experiencing problem on i586 box, it might be possible that maintainer of glibc package in CentOS forgot to include that "use i486 instructions for NPTL support when building i386 glibc" hack into the latest version of glibc (4.4 shipped with new version of glibc).
Aleksandar Milivojevic wrote:
fredex wrote:
I haven't contacted RH yet because my RHEL4 box is currently down due to a hardware problem, and I know they won't talk to me if I say I can reproduce it on a Centos box! :)
Stupid question, but is the box in question an i586 by any chance? There used to be issues with NPTL and i586 long time ago in FC2, which were mostly fixed in FC3. The fix for RHEL4 was simply not to support i586 architecture (CentOS project re-added it).
So if it is an old i586 box, Red Hat ain't gonna talk to you since RHEL4 is not supposed to run on an i586.
I'm sceptical, Fred says it happens on RHEL (but his RHEL box is down) and Centos 4. He might be running a sort of living museum, I suppose, but he also indicated he can talk to RH about it (when his RHEL box is back up), so I guess it's not a Pentium.
Nice trivia tho:-)
I decided to chime in again to note that the only systems Fred says it fails on are running 2.6 kernels.
It might be worth booting an older 2.6 kernel to see whether that works around the problem.
On Fri, Sep 08, 2006 at 01:13:18PM +0800, John Summerfield wrote:
Aleksandar Milivojevic wrote:
fredex wrote:
I haven't contacted RH yet because my RHEL4 box is currently down due to a hardware problem, and I know they won't talk to me if I say I can reproduce it on a Centos box! :)
Stupid question, but is the box in question an i586 by any chance? There used to be issues with NPTL and i586 long time ago in FC2, which were mostly fixed in FC3. The fix for RHEL4 was simply not to support i586 architecture (CentOS project re-added it).
So if it is an old i586 box, Red Hat ain't gonna talk to you since RHEL4 is not supposed to run on an i586.
I'm sceptical, Fred says it happens on RHEL (but his RHEL box is down) and Centos 4. He might be running a sort of living museum, I suppose, but he also indicated he can talk to RH about it (when his RHEL box is back up), so I guess it's not a Pentium.
Nice trivia tho:-)
I decided to chime in again to note that the only systems Fred says it fails on are running 2.6 kernels.
It might be worth booting an older 2.6 kernel to see whether that works around the problem.
Nope, it's a P4. We have lots of customers running it on RHEL3. Apparently the people who actually build the machines for shipment have just started using RHEL4. One that is already in the field was built with the original shipped CDs, which is not even at U1 level, and it works apparently fine. A new one they've just built, that isn't shipped out yet, used the U6 CDROMs, and it fails. Since it's across the country from me (the programmer) it's easier for me to debug locally, and since what I happen to have available that is at the same update level is Centos 4.4 I am working on that platform. It may be necessary to delay until I can get the RHEL4 box working again, simply so I can talk to Red Hat about it.
--
Cheers John
-- spambait 1aaaaaaa@coco.merseine.nu Z1aaaaaaa@coco.merseine.nu Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/
Please do not reply off-list _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Thu, Sep 07, 2006 at 10:27:13PM -0500, Aleksandar Milivojevic wrote:
fredex wrote:
I haven't contacted RH yet because my RHEL4 box is currently down due to a hardware problem, and I know they won't talk to me if I say I can reproduce it on a Centos box! :)
Stupid question, but is the box in question an i586 by any chance? There used to be issues with NPTL and i586 long time ago in FC2, which were mostly fixed in FC3. The fix for RHEL4 was simply not to support i586 architecture (CentOS project re-added it).
So if it is an old i586 box, Red Hat ain't gonna talk to you since RHEL4 is not supposed to run on an i586.
Thanks for the info, I was unaware of those items.
but, no, it's a P4 (HP DL320 G2 server box, in fact).
The problem with NPTL on i586 (in FC2) was that Red Hat provided only i386 and i686 versions of glibc package. NPTL can't be efficiently implemented on an i386 (it would be horribly slow). So i386 doesn't support it, and things were not working on i586 machines (kernel was supporting it, but glibc didn't). Many packages were broken because of that, most notably db3 (or db4, don't remember which one was shipped with FC2) and cyrus-imapd. Workaround in FC3 was i386 package that wasn't really i386-clean. It contained NPTL support using instructions that were available only on i486 and later processors (the glibc package should really have been labeled i486, not i386). In short, i386 version of glibc in FC3 can't run on real i386 processor (it needs i486 at minimum). However, since they provided only i586 and i686 kernels, it wasn't possible (or at least trivial) to install FC3 on real i386. And I really doubt anybody would ever attempt to install FC3 on real i386 box.
I believe (somebody correct me if I'm wrong) that CentOS 4 contains the same hack in i386 version of glibc (in order to support i586 processors). If you are experiencing problem on i586 box, it might be possible that maintainer of glibc package in CentOS forgot to include that "use i486 instructions for NPTL support when building i386 glibc" hack into the latest version of glibc (4.4 shipped with new version of glibc).
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
fredex wrote:
At work I've got a multi-threaded program targeted at Linux. It compiles on RHEL 2.1 and 3, and is targeted at 2.1, 3, and 4. UP until today, the binary built on 3 has worked fine on 4.
But on RHEL4 update 4 it dies a horrible death in pthread_create. I have reproduced the problem on Centos 4.4, where I am tyring to debug it.
Do you think this is in any way related to the libstdc++ issue I mentioned here a few days ago (without any replies, unfortunately :( )?
http://lists.centos.org/pipermail/centos/2006-September/069496.html
Tim