Hi list!
I am running CentOS 4.1 x86_64 on an Athlon64.
The box was installed only yesterday. Yesterday I was still able to compil stuff, yesterday even the tiniest and simplest programs fail to compile always with internal compiler error: Segmentation fault.
For example: make -C /usr/src/linux-2.6 SUBDIRS=/tmp/bristuff-0.2.0-RC8n/qozap ZAP=-I/tmp/bristuff-0.2.0-RC8n/zaptel-1.0.9 modules make[1]: Entering directory `/usr/src/kernels/2.6.9-11.EL-x86_64' CC [M] /tmp/bristuff-0.2.0-RC8n/qozap/qozap.o In file included from include/linux/types.h:13, from include/linux/kernel.h:13, from /tmp/bristuff-0.2.0-RC8n/qozap/qozap.c:13: include/linux/posix_types.h:41: internal compiler error: Segmentation fault
The 'funny' thing is that the problem is in different modules for each program like: /usr/include/stdlib.h:757: internal compiler error: Segmentation fault include/linux/posix_types.h:41: internal compiler error: Segmentation /usr/include/gconv.h:72: internal compiler error:
Any ideas as to what might be broken on the box?
Thanks! Remco
Hi,
On Tue, Aug 16, 2005 at 10:19:14AM +0200, Remco Barendse wrote:
Hi list!
The box was installed only yesterday. Yesterday I was still able to compil stuff, yesterday even the tiniest and simplest programs fail to compile always with internal compiler error: Segmentation fault.
The 'funny' thing is that the problem is in different modules for each program like: /usr/include/stdlib.h:757: internal compiler error: Segmentation fault include/linux/posix_types.h:41: internal compiler error: Segmentation /usr/include/gconv.h:72: internal compiler error:
Any ideas as to what might be broken on the box?
From experience i'd say few things might be the problem
a) incompatible memory or too tight timings for dual channel to work properly (if it's dual-channel). I for example have to reduce timing on one dual-core Athlon64 from 1T to 2T or the box isn't stable even with lowered (manually) DDR-speeds. Not even with BH-5 chips.
b) CPU itself is overheating which makes gcc internal checking fail. This should manifest itself differently tho (the signal 11 syndfrome).
I do remember having those segfaults with too tight memory timings sometimes. Usually the box just hard locked every time in less than hour. The 1GB setup (2x512MB) was fine with any timing if not using dual-channel, but dual-channel just didn't work with default timings. The mobo here is DFI LanParty (something PCI-E) which has been happy with 4x512MB dual-sided DDR-400 (should be reduced to 333 tho) after diabling the 1T timings.
I can't say that this is the problem, but from the symptoms i'd pretty much say it is.
From experience i'd say few things might be the problem
a) incompatible memory or too tight timings for dual channel to work properly (if it's dual-channel). I for example have to reduce timing on one dual-core Athlon64 from 1T to 2T or the box isn't stable even with lowered (manually) DDR-speeds. Not even with BH-5 chips.
b) CPU itself is overheating which makes gcc internal checking fail. This should manifest itself differently tho (the signal 11 syndfrome).
I do remember having those segfaults with too tight memory timings sometimes. Usually the box just hard locked every time in less than hour. The 1GB setup (2x512MB) was fine with any timing if not using dual-channel, but dual-channel just didn't work with default timings. The mobo here is DFI LanParty (something PCI-E) which has been happy with 4x512MB dual-sided DDR-400 (should be reduced to 333 tho) after diabling the 1T timings.
I can't say that this is the problem, but from the symptoms i'd pretty much say it is.
Hi Pasi!
Nice hearing you again :)
Thanks for the input. This box has been running Centos 3.5 for over a year now without any stability problems (running plain vanilla kernel 2.6). It's a single core Athlon64.
I guess that would pretty much rule out a hardware problem? I'll try to reduce the timings anyway.
Cheers! Remco
Hi,
On Tue, Aug 16, 2005 at 11:05:51AM +0200, Remco Barendse wrote:
Nice hearing you again :)
I'd say it's other way around. As when i did stop with Tao Linux, i do think you still continued to use it :)
Thanks for the input. This box has been running Centos 3.5 for over a year now without any stability problems (running plain vanilla kernel 2.6). It's a single core Athlon64.
I guess that would pretty much rule out a hardware problem? I'll try to reduce the timings anyway.
I'd personal.ly rule out the CentOS-4 not working allright. Might be the hardware/BIOS combinatio too. I did have some Epox-board and Athlon64 which was not stable (== not even week uptimes) with any memory timings until BIOS update fixed it.
Another thing ruling CentOS out from my part is that when i stress test my new hardware, i usually do a funbuild of CentOS codebase which usually works aOK when the hardware is working aOK. lately i have been considering this too little of work as dual-code Athlon64 crunches this codebase around just too quicly (12h or something). This part of 'my unintentional testing' pretty much rules out CentOS-4 distribution being somehow faulty.
So maybe it's something triggered with CentOS-4 kernel which is quite radically different now from the vanilla kernel sources? I really don't know. The symptoms are pretty much same, what i did see while, iterating the problems on dyal-core few weeks ago.
On Tue, Aug 16, 2005 at 11:05:51AM +0200, Remco Barendse wrote:
Nice hearing you again :)
I'd say it's other way around. As when i did stop with Tao Linux, i do think you still continued to use it :)
Indeed, but I migrated the production Tao 1.0 boxes to Centos 3.x. I don't see any reason to stop using it :) I do miss the x86_64 kernel with 3Ware support for CentOS 3.x :)
Thanks for the input. This box has been running Centos 3.5 for over a year now without any stability problems (running plain vanilla kernel 2.6). It's a single core Athlon64.
I guess that would pretty much rule out a hardware problem? I'll try to reduce the timings anyway.
I'd personal.ly rule out the CentOS-4 not working allright. Might be the hardware/BIOS combinatio too. I did have some Epox-board and Athlon64 which was not stable (== not even week uptimes) with any memory timings until BIOS update fixed it.
Another thing ruling CentOS out from my part is that when i stress test my new hardware, i usually do a funbuild of CentOS codebase which usually works aOK when the hardware is working aOK. lately i have been considering this too little of work as dual-code Athlon64 crunches this codebase around just too quicly (12h or something). This part of 'my unintentional testing' pretty much rules out CentOS-4 distribution being somehow faulty.
So maybe it's something triggered with CentOS-4 kernel which is quite radically different now from the vanilla kernel sources? I really don't know. The symptoms are pretty much same, what i did see while, iterating the problems on dyal-core few weeks ago.
Thinking further, you could be right (partially) :)
When messing around loading a kernel module I caused a kernel panic and the box was shutdown uncleanly. I *suspect* that this may have caused some corruption on the filesystem (even though I did force a filesystem check and got no errors) and damaged some important stuff.
Rebooting the box doesn't help, I keep getting the same errors. Yesterday before I crashed the box I was able to compile various bits without problems.
I think I will try nuking and re-installing the box first before messing with the timing settings or other stuff.
Thanks! Remco
Hi,
On Tue, Aug 16, 2005 at 01:09:08PM +0200, Remco Barendse wrote:
On Tue, Aug 16, 2005 at 11:05:51AM +0200, Remco Barendse wrote:
Indeed, but I migrated the production Tao 1.0 boxes to Centos 3.x. I don't see any reason to stop using it :) I do miss the x86_64 kernel with 3Ware support for CentOS 3.x :)
The support is there still as i do maintain those kernels, but it's not avaulable at installation time.
http://core.upi.iki.fi/out/kernel/
still has the latest kernel available. The set of supported features is a little reduced from what it was (like i dropped the directmapper some time ago), but XFS and 3ware are there.
Thinking further, you could be right (partially) :)
When messing around loading a kernel module I caused a kernel panic and the box was shutdown uncleanly. I *suspect* that this may have caused some corruption on the filesystem (even though I did force a filesystem check and got no errors) and damaged some important stuff.
rpm -Va
That does actually walk thru fs and verify if something has changed with files installed with rpm. There will be a lot of lines from various config files, so anyone running these should not get immetiatedly alerted 'my box is 0wned' :)
DISCLAIMER: Off-topic, useless to CentOS, killfile me, etc...
On Tue, 2005-08-16 at 11:23 +0300, Pasi Pirhonen wrote:
... I for example have to reduce timing on one dual-core Athlon64 from 1T to 2T or the box isn't stable even with lowered (manually) DDR- speeds. Not even with BH-5 chips ... The mobo here is DFI LanParty (something PCI-E) which has been happy with 4x512MB dual-sided DDR-400 (should be reduced to 333 tho) after diabling the 1T timings.
Yes, you should either reduce your synchronous timings to DDR333 (PC2700) or reduce the number of DIMMs to 1 per channel (2 total).
JEDEC specifications only allow for 1 DIMM per DDR400 (PC3200) channel, or 2 DIMMs per DDR333/DDR266 (PC2700/PC2100) channel.
The only exception is if you use registered DIMMs, which allows twice as many to be used.
On Tue, 16 Aug 2005, Bryan J. Smith wrote:
DISCLAIMER: Off-topic, useless to CentOS, killfile me, etc...
On Tue, 2005-08-16 at 11:23 +0300, Pasi Pirhonen wrote:
... I for example have to reduce timing on one dual-core Athlon64 from 1T to 2T or the box isn't stable even with lowered (manually) DDR- speeds. Not even with BH-5 chips ... The mobo here is DFI LanParty (something PCI-E) which has been happy with 4x512MB dual-sided DDR-400 (should be reduced to 333 tho) after diabling the 1T timings.
Yes, you should either reduce your synchronous timings to DDR333 (PC2700) or reduce the number of DIMMs to 1 per channel (2 total).
JEDEC specifications only allow for 1 DIMM per DDR400 (PC3200) channel, or 2 DIMMs per DDR333/DDR266 (PC2700/PC2100) channel.
The only exception is if you use registered DIMMs, which allows twice as many to be used.
OK, just nuked the box. Did a reformat and newly install of CentOS 4.1. No changes to the BIOS or mem timings or anything. Now I am able to compile.
Guess it wasn't a timing problem. The box was horribly slow too, tarring and gzipping 600 mb took about 2 hours. The box has been on the internet without a firewall for approx 30 minutes, could it be r00ted that quick?