On 07/09/06, Jim Perrin jperrin@gmail.com wrote:
On 9/6/06, Sebastien Tremblay sebastien.tremblay@au.cmpmedica.com
wrote:
Hi all,
I'm not very used to CentOS and Linux generally speaking, though I
read a lot and 'man' quickly became a very good friend of mine, right after Google, so sorry if that's a ridiculous one! (at least
it'll give you a good laugh!)
I updated the kernel to 2.6.9-42.0.2 recently and that raised a question... Simple... Yet silly...! What exactly is the difference
between 2.6.9-42.0.2EL and 2.6.9-42.0.2ELsmp? It seems the default
install was the 'smp' kernel, but as I had to change the grub
config
I stumbled across this one and... started wondering...
SMP is for dual processor (or dual core) systems. The default EL kernel is for a single processor system, but is installed as a failsafe kernel on smp systems "just in case".
AFAIK running smp kernel on a single core machine with hyper threading
switched on is also good idea.
Awesome... Next server has HT.. So I'll give that a try!
Cheers for your time!
Seb.
_____________________________________________________________________ This message and any attachments are confidential and are solely intended for the use of the addressee(s). If you are not the intended recipient please contact the sender by reply email. Please also disregard the contents of this email and delete and destroy any copies immediately. CMPMedica Australia Pty Ltd does not accept liability for the views expressed in this email or for the consequences of any computer viruses that may be transmitted with this email. Also subject to copyright, no part of this message should be reproduced or transmitted without written consent.
Awesome... Next server has HT.. So I'll give that a try!
Keep in mind that HT is its own special brand of hell, and isn't really *true* SMP. The kernel will work fine, and you'll see 2x the processors, but at most you'll get about a 3% performance boost, and at worst, it'll actually hurt performance. Some motherboards have buggy HT implementations and may cause some locking or slowness. Sometimes this can be resolved by appending apci=ht to your kernel boot line. With HT you also have the possibility of cache thrashing, which can/will impact performance. Make sure you know what you're getting into.
Jim Perrin wrote:
Awesome... Next server has HT.. So I'll give that a try!
Keep in mind that HT is its own special brand of hell, and isn't really *true* SMP. The kernel will work fine, and you'll see 2x the processors, but at most you'll get about a 3% performance boost, and at worst, it'll actually hurt performance. Some motherboards have buggy HT implementations and may cause some locking or slowness. Sometimes this can be resolved by appending apci=ht to your kernel boot line. With HT you also have the possibility of cache thrashing, which can/will impact performance. Make sure you know what you're getting into.
It's not all loss. I have a small benchmark written in Perl to exercise the CPU a little. I have here two muts (machines under test), Mopoke's a Dell Pentium IV 3.00 with HT enabled, running Suse 10.1. Bilby's a Sempron 2500+, so it's a bit slower. Also, it's running roughly Nahant, so compiled with older (slower?) gcc and different perl.
What I'm illustrating here is the difference HT can make:
summer@Mopoke:~> time bm.perl&time bm.perl&wait [1] 3480 [2] 3481
real 0m23.935s user 0m23.689s sys 0m0.004s
real 0m25.906s user 0m24.746s sys 0m0.004s [1]- Done time bm.perl [2]+ Done time bm.perl summer@Mopoke:~> [summer@bilby ~]$ time bm.perl&time bm.perl&wait [1] 10099 [2] 10100
real 0m49.343s user 0m24.287s sys 0m0.011s
real 0m49.371s user 0m24.405s sys 0m0.013s [1]- Done time bm.perl [2]+ Done time bm.perl [summer@bilby ~]$
Note that on mopoke, user for each is about equal to elapsed, about what one wout expect with dual-core or SMP.
On Bilby, user for each is about half elapsed, just as one would expect.
For those who like to play by themselves, here's the code:
[summer@bilby ~]$ cat bin/bm.perl #!/usr/bin/perl use integer; $i = 0; while ($i < 10000) { $j = 0; while ($j < 10000) { ++$j; } ++$i; }
[summer@bilby ~]$
El vie, 08-09-2006 a las 10:59 +0800, John Summerfield escribió:
Jim Perrin wrote:
Awesome... Next server has HT.. So I'll give that a try!
Keep in mind that HT is its own special brand of hell, and isn't really *true* SMP. The kernel will work fine, and you'll see 2x the processors, but at most you'll get about a 3% performance boost, and at worst, it'll actually hurt performance. Some motherboards have buggy HT implementations and may cause some locking or slowness. Sometimes this can be resolved by appending apci=ht to your kernel boot line. With HT you also have the possibility of cache thrashing, which can/will impact performance. Make sure you know what you're getting into.
It's not all loss. I have a small benchmark written in Perl to exercise the CPU a little. I have here two muts (machines under test), Mopoke's a Dell Pentium IV 3.00 with HT enabled, running Suse 10.1. Bilby's a Sempron 2500+, so it's a bit slower. Also, it's running roughly Nahant, so compiled with older (slower?) gcc and different perl.
What I'm illustrating here is the difference HT can make:
summer@Mopoke:~> time bm.perl&time bm.perl&wait [1] 3480 [2] 3481
real 0m23.935s user 0m23.689s sys 0m0.004s
real 0m25.906s user 0m24.746s sys 0m0.004s [1]- Done time bm.perl [2]+ Done time bm.perl summer@Mopoke:~> [summer@bilby ~]$ time bm.perl&time bm.perl&wait [1] 10099 [2] 10100
real 0m49.343s user 0m24.287s sys 0m0.011s
real 0m49.371s user 0m24.405s sys 0m0.013s [1]- Done time bm.perl [2]+ Done time bm.perl [summer@bilby ~]$
Note that on mopoke, user for each is about equal to elapsed, about what one wout expect with dual-core or SMP.
On Bilby, user for each is about half elapsed, just as one would expect.
For those who like to play by themselves, here's the code:
[summer@bilby ~]$ cat bin/bm.perl #!/usr/bin/perl use integer; $i = 0; while ($i < 10000) { $j = 0; while ($j < 10000) { ++$j; } ++$i; }
[summer@bilby ~]$
Hi John,
Comparing your reference with a Dell Dimension 3100 (P4HT, 3.0GHz, 2ML2, 2GB) but with apci=ht enabled, slightly low compared as your SuSe.
I'll re-exec on new 42.0.2 next weeks without apci=ht to see differences
Fyi i've been working without it for long time without issues on 34.0.2, just incorpored this flag in last times for security/stability after reading others notes)
[root@sparkbox ~]# uname -a Linux sparkbox.stigmatedbrain.net 2.6.9-34.0.2.ELsmp #1 SMP Fri Jul 7 19:52:49 CDT 2006 i686 i686 i386 GNU/Linux [root@sparkbox ~]# free total used free shared buffers cached Mem: 2065384 1708756 356628 0 6924 191476 -/+ buffers/cache: 1510356 555028 Swap: 2031608 657924 1373684 [root@sparkbox ~]# cat /etc/grub.conf | grep 2.6.9-34.0.2.ELsmp title CentOS (2.6.9-34.0.2.ELsmp) kernel /vmlinuz-2.6.9-34.0.2.ELsmp ro root=/dev/VolGroup00/LogVol00 selinux=0 vga=0x031a apci=ht initrd /initrd-2.6.9-34.0.2.ELsmp.img [root@sparkbox ~]# cat /var/log/dmesg | grep "CPU: L2" CPU: L2 cache: 2048K CPU: L2 cache: 2048K [root@sparkbox ~]# cat bin/cpubench.pl #!/usr/bin/perl # Usage on HT/MultiCPU host: # time cpubench.pl&time cpubench.pl&wait use integer; $i = 0; while ($i < 10000) { $j = 0; while ($j < 10000) { ++$j; } ++$i; }
[root@sparkbox ~]# cat /etc/issue | grep CentOS CentOS release 4.3 (Final)
[root@sparkbox ~]# time cpubench.pl&time cpubench.pl&wait [1] 8003 [2] 8005
real 0m30.166s user 0m16.680s sys 0m0.085s
real 0m30.809s user 0m16.606s sys 0m0.066s [1]- Done time cpubench.pl [2]+ Done time cpubench.pl
Cheers,
Jose.
On Fri, Sep 08, 2006 at 10:59:10AM +0800, John Summerfield wrote:
It's not all loss. I have a small benchmark written in Perl to exercise the CPU a little.
[...]
use integer;
This excercises the CPU only a very, very little -- a tight loop of integer math with no system calls or anything.
Change this to do floating point math and watch what happens.
El vie, 08-09-2006 a las 10:05 -0400, Matthew Miller escribió:
On Fri, Sep 08, 2006 at 10:59:10AM +0800, John Summerfield wrote:
It's not all loss. I have a small benchmark written in Perl to exercise the CPU a little.
[...]
use integer;
This excercises the CPU only a very, very little -- a tight loop of integer math with no system calls or anything.
Change this to do floating point math and watch what happens.
Hi folks,
Just for fun i found at http://samba.org/junkcode/ an interesting and simple test http://samba.org/ftp/unpacked/junkcode/speed.c which i compiled as follows:
[root@sparkbox bin]# export LANG=C; gcc -v -O2 -lm -o speed speed.c; export LANG=es_ES.UTF-8 Reading specs from /usr/lib/gcc/i386-redhat-linux/3.4.5/specs Configured with: ../configure --prefix=/usr --mandir=/usr/share/man -- infodir=/usr/share/info --enable-shared --enable-threads=posix -- disable-checking --with-system-zlib --enable-__cxa_atexit --disable- libunwind-exceptions --enable-java-awt=gtk --host=i386-redhat-linux Thread model: posix gcc version 3.4.5 20051201 (Red Hat 3.4.5-2) /usr/libexec/gcc/i386-redhat-linux/3.4.5/cc1 -quiet -v speed.c -quiet - dumpbase speed.c -auxbase speed -O2 -version -o /tmp/ccNWoHQq.s ignoring nonexistent directory "/usr/lib/gcc/i386-redhat- linux/3.4.5/../../../../i386-redhat-linux/include" #include "..." search starts here: #include <...> search starts here: /usr/local/include /usr/lib/gcc/i386-redhat-linux/3.4.5/include /usr/include End of search list. GNU C version 3.4.5 20051201 (Red Hat 3.4.5-2) (i386-redhat-linux) compiled by GNU C version 3.4.5 20051201 (Red Hat 3.4.5-2). GGC heuristics: --param ggc-min-expand=100 --param ggc-min- heapsize=131072 as -V -Qy -o /tmp/ccoiuoVT.o /tmp/ccNWoHQq.s GNU assembler version 2.15.92.0.2 (i386-redhat-linux) using BFD version 2.15.92.0.2 20040927 /usr/libexec/gcc/i386-redhat-linux/3.4.5/collect2 --eh-frame-hdr -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 -o speed /usr/lib/gcc/i386- redhat-linux/3.4.5/../../../crt1.o /usr/lib/gcc/i386-redhat- linux/3.4.5/../../../crti.o /usr/lib/gcc/i386-redhat- linux/3.4.5/crtbegin.o -L/usr/lib/gcc/i386-redhat-linux/3.4.5 - L/usr/lib/gcc/i386-redhat-linux/3.4.5 -L/usr/lib/gcc/i386-redhat- linux/3.4.5/../../.. -lm /tmp/ccoiuoVT.o -lgcc --as-needed -lgcc_s --no- as-needed -lc -lgcc --as-needed -lgcc_s --no-as- needed /usr/lib/gcc/i386-redhat-linux/3.4.5/crtend.o /usr/lib/gcc/i386- redhat-linux/3.4.5/../../../crtn.o
Here are the results for mentionned host using acpi=ht,
[root@sparkbox bin]# time speed & time speed & wait [1] 29836 [2] 29838 Floating point - sin() - 10.3853 MOPS Floating point - log() - 8.67528 MOPS Floating point - sin() - 1.93525 MOPS Floating point - log() - 9.25754 MOPS
Memcpy - 1kB - 2542.87 Mb/S Memcpy - 1kB - 1238.97 Mb/S Memcpy - 100kB - 1933.49 Mb/S Memcpy - 100kB - 2237.83 Mb/S Memcpy - 1MB - 735.953 Mb/S Memcpy - 1MB - 783.704 Mb/S Memcpy - 10MB - 547.606 Mb/S Adding integers - 563.318 MOPS Memcpy - 10MB - 534.85 Mb/S Adding integers - 488.456 MOPS Adding floats (size 4) - 74.4939 MOPS Adding doubles (size 8) - 59.9683 MOPS
real 0m4.887s user 0m2.431s sys 0m0.063s Adding floats (size 4) - 22.3984 MOPS [1]- Exit 13 time speed Adding doubles (size 8) - 46.3919 MOPS
real 0m5.005s user 0m2.389s sys 0m0.056s [2]+ Exit 13 time speed [root@sparkbox bin]# [root@sparkbox bin]# time speed Floating point - sin() - 10.6428 MOPS Floating point - log() - 8.5237 MOPS Memcpy - 1kB - 3440.33 Mb/S Memcpy - 100kB - 3964.34 Mb/S Memcpy - 1MB - 1585.58 Mb/S Memcpy - 10MB - 971.622 Mb/S Adding integers - 608.727 MOPS Adding floats (size 4) - 48.3473 MOPS Adding doubles (size 8) - 66.073 MOPS
real 0m2.622s user 0m2.212s sys 0m0.052s
[root@sparkbox bin]#
I know there's a new release of gcc in U4 and im still using the old 4.3, but in next weeks ill do the update...
Good weekend all,
Jose
Hi John,
John Summerfield wrote:
summer@Mopoke:~> time bm.perl&time bm.perl&wait [1] 3480 [2] 3481
real 0m23.935s user 0m23.689s sys 0m0.004s
real 0m25.906s user 0m24.746s sys 0m0.004s [1]- Done time bm.perl [2]+ Done time bm.perl
What are the results for a single "time bm.perl" ?
These are my results, first on a HT P4: root@ihbids /tmp# uname -a Linux ihbids.ihbi.qut.edu.au 2.6.9-42.0.2.ELsmp #1 SMP Wed Aug 23 00:17:26 CDT 2006 i686 i686 i386 GNU/Linux root@ihbids /tmp# time ./bm.perl uname -a real 0m12.783s user 0m12.765s sys 0m0.002s root@ihbids /tmp# time ./bm.perl & time ./bm.perl & wait [1] 29798 [2] 29799
real 0m21.422s user 0m21.395s sys 0m0.003s [1]- Done time ./bm.perl
real 0m21.589s user 0m20.467s sys 0m0.003s [2]+ Done time ./bm.perl
One a system with real SMP (Opteron, dual core, dual socket): root@basilisk /tmp# uname -a Linux basilisk.ihbi.qut.edu.au 2.6.9-34.0.2.ELsmp #1 SMP Fri Jul 7 18:22:55 CDT 2006 x86_64 x86_64 x86_64 GNU/Linux root@basilisk /tmp# time ./bm.perl
real 0m14.309s user 0m14.095s sys 0m0.121s root@basilisk /tmp# time ./bm.perl & time ./bm.perl & wait [1] 26588 [2] 26589
real 0m14.232s user 0m14.178s sys 0m0.001s [1]- Done time ./bm.perl
real 0m14.256s user 0m14.194s sys 0m0.026s [2]+ Done time ./bm.perl root@basilisk /tmp# time ./bm.perl & time ./bm.perl & time ./bm.perl & time ./bm.perl & wait [1] 26592 [2] 26593 [3] 26594 [4] 26595 real 0m14.164s user 0m14.125s sys 0m0.009s
real 0m14.597s user 0m14.149s sys 0m0.071s
real 0m15.020s user 0m14.335s sys 0m0.030s [1] Done time ./bm.perl [3]- Done time ./bm.perl [4]+ Done time ./bm.perl
real 0m15.143s user 0m14.168s sys 0m0.196s [2]+ Done time ./bm.perl
I believe my results show that HT does almost nothing at all. The P4 box does seem to have HT enabled - it is showing two cpus in /proc/cpuinfo and: root@ihbids /tmp# dmesg |grep 'CPU: L' CPU: L2 cache: 2048K CPU: L2 cache: 2048K
Regards Robert