List,
It looks like I may have to drop back to a 32-bit version of the OS due to some non-resolvable library issues for the software I'm attempting to compile, and want to ask, how far back can I go with CentOS where I'd not lose too much functionality of the 4.1 version. Essentially what I need is the equivalent of RHES 3. The other thing I'd thought about would be to install the needed gnu library, but in a non-standard place, like /home/sam or some other user directory. Since the software only needs this at runtime and not during linking, I don't see what It would bother. Given that idea, how would I use RPM to install that library ? I read the man page, but did not seen much option for installing to a non-std location, and not have the system actually "recognize" it as being there.
suggestions?
You can't drop back to the 32-bit version of CenttOS 4?
Sam Drinkard wrote:
Unfortunately, No. The current version of the x86-64 does not have, and I can't install the glibc-2.3.3 libs to run the software, and I'm pretty sure the i386 version would be the same, however I was hoping someone could tell me. There should be some backwards compatibility, but apparently there is a specific call in the 2.3.3 lib that is not available in the 2.3.4.
William Warren wrote:
YOur other option is CentOS 3 which should suffice..:)
Sam Drinkard wrote:
The problem is probably not so much with the OS as it is the several software packages involved. First, there is the WRF numerical weather model, then there is the MPICH implementation, then there is the netcdf libraries, and lastly, the portland group fortran compiler. From what I've been able to glean from archived messages, support links, and support staff as well, this whole mess is probably not even close to working under the conditions of the compile, however it will compile and run under the single threaded model, with no nesting. The output of the attempted execution is:
./real.exe: /usr/pgi/linux86-64/6.0/lib/libpthread.so.0: version `GLIBC_2.3.3' not found (required by /lib64/tls/librt.so.1)
The three executables, real.exe, wrf.exe, and ndown.exe all compile with no errors. The environment it *should* be running under of course would be either open mp, or mpich if there were a cluster, however in this situation, there is but the single machine, a dual Xeon system with the 4.1 release of CentOS. Attempting to execute under OMP or mpich both gives the same error message as above, so there is definately a runtime problem. I am not positive if it is coming from the portland group compiler output, the mpi compiler, or what, but from the message, it leads me to believe it could in fact be a portland group compile problem. Others have built the software and have it run under earlier versions of RH, FC, etc., but I've not seen or read anyone using any of the 4.x branches of any of the OS'. I am not a programmer in any sense of the word. So, that's pretty much the extent of the situation. I do know there is a problem with the intel fortran compiler 9.0.21, and the software will NOT build or run under that situation. Intel is supposedly working on a fix, but it takes about 90+ minutes to compile the code, and from what I've seen, 1.3gb of memory at most during the compile process.
Maciej Żenczykowski wrote:
./real.exe: /usr/pgi/linux86-64/6.0/lib/libpthread.so.0: version `GLIBC_2.3.3' not found (required by /lib64/tls/librt.so.1)
Ok, on my Centos 4.1 i386 system I see the following:
[root@tcs lib]# strings /lib/libpthread-0.10.so | grep GLIBC_2.3 GLIBC_2.3.2 GLIBC_2.3 [root@tcs lib]# strings /lib/librt-2.3.4.so | grep GLIBC_2.3 GLIBC_2.3.4 GLIBC_2.3.2 [root@tcs lib]# strings /lib/tls/librt-2.3.4.so | grep GLIBC_2.3 GLIBC_2.3.4 GLIBC_2.3.3 GLIBC_2.3.2
Which leads me to believe that maybe we shouldn't be using the tls version...
try running the program(s) with
LD_ASSUME_KERNEL=2.x.y ./real.exe
with different versions of x.y (basically from 2.2.y 2.4.y and 2.6.y) maybe some value will work :)
or you can try temporarily renaming the offending /lib64/tls/librt.so.1 symlink/file
Cheers, MaZe.
MaZe,
I tried your suggestion, and it appears that with LD_ASSUME_KERNEL=2.4.1 the things quit complaining about the library. I have no idea what is happening in that respect, but something happens. I'm not seeing the "normal" message from real.exe about not having an input namelist however, so I may or may not work. running the exe's does however give me a "starting wrf task 0 of 1", and the other .exe's , real/ndown, etc., also do the same thing. I'll have to muck around in the startup script and see what happens when changing those. Also, since mpi is not running, nor do I have OMP set, it *might* just take off and work. I've got to pass this info to my compadre in crime and let him know, as there maybe some other things to look at besides the message passing protocols. I really thank you for the suggestion.
On another note, you indicate perhaps we should not be using the tls version? Can you expand on that a bit as to why, and what the tls is?
Maciej Żenczykowski wrote:
On Sat, 1 Oct 2005, Sam Drinkard wrote:
ok, tls is 'thread local storage' - basically this is a memory area which is not shared between different threads of the application (very useful).
This is much easier(?)/requires(?) kernel support to work. Which apparently was introduced in kernel 2.4.2. Furthermore this kernel support needs to be also supported/used by the threading libraries.
If everything is beautiful then a new kernel / library / loader will automagically detect whether tls is supported on the current machine or not and run with or without tls accordingly. However this seems to often be broken for stuff compiled a long time ago or with different compilers (non gcc). As such via the LD_ASSUME_KERNEL=2.4.1 argument you are telling the online linker/loader (ld-linux) that the kernel you are using is too old to support tls. Thus ld-linux (which does a lot of magic by first looking for /lib libs in /lib/ix86/tls then /lib/tls then /lib/ix86 and only then /lib for the proper x value for your cpu) skips the tls directories which contain the tls-enabled libraries - thus effectively turning of tls support (and possibly other stuff).
I guessed this was the solution based on looking through the strings output of the different libraries from your error message. The fact that I apparently hit was in the end pure luck (and a very good educated guess...) I couldn't exactly duplicate and/or explain my reasoning (although I have had other programs, like iscan, which only work with the LD_KERNEL_ASSUME=2.4.1 option)
You can also try 'export LD_ASSUME_KERNEL=2.4.1' somewhere at the top of whatever runscript the programs are using.
Cheers, MaZe.
I just got through reading several articles on the assume kernel, and your explaination was much more understandable than all of the others. After seeing it is just an environment variable that can be set, most likely the scripts that call the .exe's will handle that accordingly. I'm just not up to speed on all the scripts that my buddy has installed to run this behemoth. I did run one test, and it passed without any problems, so I'm pretty comfortable that will most likely prevent me from dropping back a version or 2 and going to the 32-bit version. If I can get one more fortran piece of code to build a shared object, I'll be happy as a pig in slop :-) That part is going to take a bit more doing tho I expect, as its talking about a relocated object that can't be linked to something or other. Several other ideas have failed to build the thing including wrapit from the NCL libraries. Once I get past this part, if you don't mind me calling on you, I'd appreciate you telling me what can be done with the one little fortran script that does the .so.
Maciej Żenczykowski wrote:
MaZe,
It appears that from my minimal testing, the environment of 2.4.1 kernel will permit the software to run. Now, how to implement that into both an mpi or open mp environment, and from scripts? You've whetted my quest for knowing why/what is happening w/r/t the library that it needed?
Cheers!
Sam
./real.exe: /usr/pgi/linux86-64/6.0/lib/libpthread.so.0: version `GLIBC_2.3.3' not found (required by /lib64/tls/librt.so.1)
I think I know where the problem lies:
on FC3 i386 system: # find /lib | grep libpthread /lib/i686/libpthread.so.0 /lib/i686/libpthread-0.10.so /lib/libpthread.so /lib/tls/libpthread.so.0 /lib/tls/libpthread-2.3.5.so /lib/libpthread.so.0 /lib/libpthread-0.10.so
obviously we have tls and non-tls versions of libpthread. Apparently the /usr/pgi does not have a tls version of libpthread (which should be in /usr/pgi/linux86-64/6.0/lib/tls/libpthread.so.0) thus causing tls to fail and come into conflict with the default tls enabled librt from glibc.
The 'proper' solution would be for the compiler to come with a tls enabled libpthread (maybe report a bug) - the 'improper' is to make the ld-linux loader skip trying the tls glibc libraries. Either by turning of tls (LD_PRELOAD_ASSUME=2.4.1) or by preloading the correct non-tls libraries (something like LD_PRELOAD=/lib64/librt.so.1 ...) possibly having to list more then just one library (separated by colons or semicolons in double quotes)
Cheers, MaZe.
On Sat, 2005-10-01 at 08:32, Sam Drinkard wrote:
If you don't have it already, try: yum install compat-glibc If that doesn't fix it, Centos3.x is as close as you'll get to RHES/RHEL 3 and it is still actively being supported with updates.
See 'yum imfo compat-glibc' for how to compile/link.
Les,
Thanks for the hint, but I already have the compat libs installed, and unfortunately, the 2.3.3 is not amongst them. IIRC, there was no listings for a 2.3.3 glibc for CentOS on any of the google sites I checked out. Several other distros, but none here. I'm grabbing the 3.5 /i386 bits now and will likely drop back to that version. I guess the weather development community is not quite ready for prime-time 64-bit OS's under linux, but other *nixes are already supported.
Les Mikesell wrote:
As far as I know Les, the 2.3.3 does exist, but not under RH, Centos or the "big 3" for that matter. I recall seeing a few rpm pkgs for some distros I've never heard of. As for it being exactly what is needed, I can't say anything for sure, other than what the runtime error prints out. I've looked in the librt.so.1, and it does in fact have an external call to 2.3.3, but beyond that, I dunno. It could well be a compiler error too. Many of the folks who have had success building and running WRF are doing so under an earlier version of the PG compiler, I think pre-6.0. Since I don't own the portland group compiler, just using the trial version, I'm not entitled to any support from them, and as I stated earlier, there has been little to no discussion about late date versions of Linux. The other thing that is secondary to the WRF build is some fortran utilities and one shared object file that will not build under the x86-64 due to some relocation issues. I think all the code that I'm attempting to build/run was pretty well tested out on the i386 versions of several OS', but primarily RH3.
On another note, I do have folks that have sent me some configuration files for the MM5 model running under the x86-64 RH distro, and they all say it really kicks ass compared to the 32-bit builds, but that is again, using an earlier version of the Intel compiler and earlier code to boot. I'm not one to stay on the bleeding edge of things, but I figured there would not be too much change from 3 to 4, but I'm finding out different. As for the actual 2.3.3 library, I don't know if it will be in the 3.5 distro of Centos or not, but its worth a try.
Les Mikesell wrote:
Sam Drinkard wrote:
I have used the PGI F77 compiler under both trial & license, & I *think* they will provide limited support for the trial version, I would try them.