On Wednesday, September 05, 2012 11:48:41 AM Lamar Owen wrote:
I'll shoot you an e-mail a little later today or tomorrow about what I've found thus far, and a couple of questions; that is, once I get caught up from a long weekend.... :-)
Thanks for the offer, it is most appreciated...
An update for the -devel list.... I've been keeping Karanbir updated, but this might have a little wider interest, now that I'm actually getting somewhere.
After having a difficult time getting a build started on our 20 CPU Altix 3700, I punted and tried doing the same builds on our smaller 2 CPU Altix 3200 (same basic machine as the 3700, just smaller). Using the last Scientific Linux CERN 5.4 IA64 build as my base, once installed I was able to successfully build the kernel, which I had not been able to do on the larger box. This pointed to either an SMP/NUMA issue with IA64 Linux, or an issue with the 20 CPU's box's hardware.
Since we have another large box (30 CPU Altix 350) that was successfully running tests on Debian 6, and since I'd really prefer not to use Debian (but I can go back to it if I have to) I put SLC 5.4 on it, which was a little more of an adventure due to the partitioning.... after getting the build environment set up, I was indeed able to successfully build the CentOS 5.8 updated kernel, which I had not been able to do on the 20 CPU box, but had done, much more slowly, on the 2 CPU box (many hours on the 2 CPU box; less than 2 hours on the 30 CPU box, and most of that time was disk I/O writing out the binary RPMs). Hmm, something is wrong, hardware-wise, with the 20 CPU box, it seems.
After a few false starts, I got mock up and running, version 1.0.28 from EPEL (it's a noarch RPM; installing the RPM out of EPEL's x86_64 repo worked fine, just had to rebuild a couple of its dependencies). I then rolled smock.pl over to it; mockchain, while it looks like a good piece of software, apparently is too new for mock 1.0.28, so it's smock for now.
Now, the 5.8 glibc doesn't want to build using the SLC 5.4 binary RPMs as 'seed' for the buildroot; so after careful thought, and understanding how long this might take, I mirrored the SRPMS for CentOS 5.5, 5.6, and 5.7 (already have 5.8 down). Test builds of both the 5.5 kernel and glibc were successful, and so I set the box to building the full 'os' repo of the CentOS 5.5 SRPMS, using the SLC 5.4 binaries to 'seed' the buildroot. Once the 5.5 set is built, I'll re-seed the build root with the 5.5 binaries, and either rebuild 5.5 or 5.6, and then step up one rev at a time until I get 5.8 to rebuild. At that point, I plan to do an actual iso spin of 5.8 (internal use only at the moment, unless there is wider interest), and try to install it on the 2 CPU Altix 3200. Maybe 5.9 will be out by that time (I figure it will take at least a month to build things stepwise).
In any case, the local rebuild of CentOS 5.5 using the SLC5.4 binaries as 'seed' started Saturday evening; as of 8:20A today it has successfully rebuilt 443 source RPMS, producing 1243 binary RPMS, and have seen 35 packages fail thus far. I think most of those failures are due to my forgetting that m4 version 1.4.8 is required, and m4 1.4.5 is provided in the SLC54 repos, so once this first pass completes I'll retry the failed packages (very easy to do with smock).
The biggest help was having SLC 5.4 available in an installable form, even though I had to do it as a network install; the install DVD didn't boot on the Altix systems, but the small boot.iso did, and serving up the packages on a webserver here an HTTP install was quick and painless. The CERN folk, especially Jaroslaw Polok, did the biggest part of the grunt work there, and I thank the CERN team.
Oh, and if you're interested in this sort of thing, pics of both the 30 CPU and the 2 CPU boxen (they occupy the same rack) can be seen a little way down the page at: http://forums.nekochan.net/viewtopic.php?f=14&t=16725868