Performance issues/difference of two servers running same task (one is quicker)

List overview All Threads
Download

newer

older

Migrate older disk image to new...

CentOS-announce Digest, Vol 173,...

Jobst Schmalenbach

4 Jul 2019 4 Jul '19

6:43 a.m.

I need some advice what to do next, even if someone tells me to check out (an)other mailing list(s), tuning site or point me in a better direction how to solve my annoying problem: one server is much faster for certain tasks although on "shitty" hardware.

I have tried many things to solve my issue - changed buffer/pool/cache/etc mysqld - changed server settings apache/php - changed various OS settings (sysctl) e.g. turned off IPV6 but havent figured it out.

I have a development server (local) and life servers (data center) Used mainly for many different websites and one online training site.

the development and life server in question run the same software setup: - CentOS Linux release 7.6.1810 - bind 32:9.9.4-74.el7_6.1 - Apache/2.4.6 (CentOS) - PHP 7.1.29 - mysqld Ver 5.7.26 - wordpress, woocommerce, wishlistmember, Sensei etc - software are all in the same stages of updates. - even many of the linux conf files are the same (/etc/host, bind, etc) - the databases are copies/identical

Life server is a Poweredge M710,48GB,2xXeon L5630,LSI Raid1 SSD Dev server is a DIY, GIGABYTE MX31-BS0, 32GB, 1xXeon E3-1245,MDADM RAID0 1TB Seagate Spinners

Clearly the development server is hardware wise way below the specs of the Dell but software wise they are identical (they get upgraded at the same time).

During normal operations (i.e. display websites, online training courses etc) the DELL displays the websites faster although it sits 1000KM up north in a datacenter on a different network than the local server on the same network as my machine.

Yet the DEV server outshines the DELL when creating a few large custom tables, ie the local server takes 5s while the DELL takes 15s (small tables), more for bigger tables.

The task is based on: - level, member, course, group are all ID's - members can belong to a group, a level and can access many courses - the ID restricts what they can access and what they belong to. - a course for each member can have various stages of completion - using an API (wishlist member) that performs LOCAL calls when accessed locally I can get who belongs to what and make up my info I need, then use PHP to make up the table. - DB calls ARE LOCAL!

Now when I try to create a table of members belonging to the same group level doing the same course with different stages of completion the DELL takes on average 3 times longer to complete the table (normally about 20 to 30 rows).

I have put microtime() calls before and after certain calls, and it's visibly different: DEV Jul 04 04:57:26 UTC _members took 0.0005459785461425 ms Jul 04 04:57:26 UTC _members took 0.0005321502685546 ms LIFE Jul 04 05:00:36 UTC _members took 0.0014369487762451 ms Jul 04 05:00:36 UTC _members took 0.0013291835784912 ms If I do this 300+ times, the outcome is very different.

So my questions:

- How can it be that the DELL takes so much longer alltough on the far better hardware? - How can it be allthough everything (software/os/plugins) is the same? - This even happens if the DELL is on low load (i.e. middle of the night) and only serves a few requests.

Same software, same config, same database, same amount of data in the database yet on better hardware it's slower?

Any ideas anyone?

-- Jobst Schmalenbach

Show replies by date

Simon Matter

4 Jul 4 Jul

7:07 a.m.

...

Hi

I need some advice what to do next, even if someone tells me to check out (an)other mailing list(s), tuning site or point me in a better direction how to solve my annoying problem: one server is much faster for certain tasks although on "shitty" hardware.

I have tried many things to solve my issue

changed buffer/pool/cache/etc mysqld

changed server settings apache/php

changed various OS settings (sysctl) e.g. turned off IPV6

but havent figured it out.

I have a development server (local) and life servers (data center) Used mainly for many different websites and one online training site.

the development and life server in question run the same software setup:

CentOS Linux release 7.6.1810

bind 32:9.9.4-74.el7_6.1

Apache/2.4.6 (CentOS)

PHP 7.1.29

mysqld Ver 5.7.26

wordpress, woocommerce, wishlistmember, Sensei etc

software are all in the same stages of updates.

even many of the linux conf files are the same (/etc/host, bind, etc)

the databases are copies/identical

Life server is a Poweredge M710,48GB,2xXeon L5630,LSI Raid1 SSD Dev server is a DIY, GIGABYTE MX31-BS0, 32GB, 1xXeon E3-1245,MDADM RAID0 1TB Seagate Spinners

Clearly the development server is hardware wise way below the specs of the Dell but software wise they are identical (they get upgraded at the same time).

During normal operations (i.e. display websites, online training courses etc) the DELL displays the websites faster although it sits 1000KM up north in a datacenter on a different network than the local server on the same network as my machine.

Yet the DEV server outshines the DELL when creating a few large custom tables, ie the local server takes 5s while the DELL takes 15s (small tables), more for bigger tables.

The task is based on:

level, member, course, group are all ID's

members can belong to a group, a level and can access many courses

the ID restricts what they can access and what they belong to.

a course for each member can have various stages of completion

using an API (wishlist member) that performs LOCAL calls when accessed

locally I can get who belongs to what and make up my info I need, then use PHP to make up the table.

DB calls ARE LOCAL!

Now when I try to create a table of members belonging to the same group level doing the same course with different stages of completion the DELL takes on average 3 times longer to complete the table (normally about 20 to 30 rows).

I have put microtime() calls before and after certain calls, and it's visibly different: DEV Jul 04 04:57:26 UTC _members took 0.0005459785461425 ms Jul 04 04:57:26 UTC _members took 0.0005321502685546 ms LIFE Jul 04 05:00:36 UTC _members took 0.0014369487762451 ms Jul 04 05:00:36 UTC _members took 0.0013291835784912 ms If I do this 300+ times, the outcome is very different.

So my questions:

How can it be that the DELL takes so much longer alltough on the far

better hardware?

How can it be allthough everything (software/os/plugins) is the same?

This even happens if the DELL is on low load (i.e. middle of the night)

and only serves a few requests.

Same software, same config, same database, same amount of data in the database yet on better hardware it's slower?

Two ideas:

a) the DELL maybe faster over all but if I'm right single core speed is slower than on DEV machine.

b) how do the LSI/SSD perform compared to the MDADM/RAID0 on the DEV server? I'm not sure the DELL is a clear winner here.

Regards, Simon

Jobst Schmalenbach

6 Jul 6 Jul

12:40 a.m.

On Thu, Jul 04, 2019 at 09:07:35AM +0200, Simon Matter via CentOS (centos@centos.org) wrote:

...

...
Hi

Two ideas:

a) the DELL maybe faster over all but if I'm right single core speed is slower than on DEV machine.

Yes, but since BOTH have "other" things to do at the same time the sheer number of CPUs of the DELL should help

...

b) how do the LSI/SSD perform compared to the MDADM/RAID0 on the DEV server? I'm not sure the DELL is a clear winner here.

See my answer to the disk task test to another email.

Roberto Ragusa

4 Jul 4 Jul

7:39 a.m.

On 7/4/19 8:43 AM, Jobst Schmalenbach wrote:

...

Clearly the development server is hardware wise way below the specs of the Dell but software wise they are identical (they get upgraded at the same time).

As a first step, you have to test subsystems one by one.

Try this to see how fast the CPU and kernel are (including meltdown/spectre slowdowns):

time dd 2>/dev/null if=/dev/zero of=/dev/null bs=1 count=1000000

Then try this to see how fast your disks are for DB operations:

cd /a/directory/on/the/filesystem/you/want/to/test time bash -c "for((i=0;i<1000;i++)); do dd 2>/dev/null if=/dev/zero of=test bs=1 count=1 conv=fsync;done" rm test

Regards.

-- Roberto Ragusa mail at robertoragusa.it

Jobst Schmalenbach

6 Jul 6 Jul

12:37 a.m.

On Thu, Jul 04, 2019 at 09:39:18AM +0200, Roberto Ragusa (mail@robertoragusa.it) wrote:

...

On 7/4/19 8:43 AM, Jobst Schmalenbach wrote:

...
Clearly the development server is hardware wise way below the specs of the Dell but software wise they are identical (they get upgraded at the same time).

As a first step, you have to test subsystems one by one.

Thank you for the tips. Here are the results (DELL is faster overall):

...

time dd 2>/dev/null if=/dev/zero of=/dev/null bs=1 count=1000000

[DIY ~] #>time dd 2>/dev/null if=/dev/zero of=/dev/null bs=1 count=1000000 real 0m1.931s user 0m1.022s sys 0m0.896s [DELL ~] #>time dd 2>/dev/null if=/dev/zero of=/dev/null bs=1 count=1000000 real 0m1.308s user 0m0.389s sys 0m0.919s

Dell faster overall

...

cd /a/directory/on/the/filesystem/you/want/to/test time bash -c "for((i=0;i<1000;i++)); do dd 2>/dev/null if=/dev/zero of=test bs=1 count=1 conv=fsync;done" rm test

[DIY /mnt] #>time bash -c "for((i=0;i<1000;i++)); do dd 2>/dev/null if=/dev/zero of=test bs=1 count=1 conv=fsync;done" real 1m12.944s user 0m1.604s sys 0m2.595s [DELL /mnt] #>time bash -c "for((i=0;i<1000;i++)); do dd 2>/dev/null if=/dev/zero of=test bs=1 count=1 conv=fsync;done" real 0m2.270s user 0m0.509s sys 0m1.475s

Expected the DIY to be slower here, it's running MDADM RAID1 on Seagete Spinners compared to LSI RAID1 SSD

The result shows the DELL overall is faster, back to the drawing board after I followed all the other hints in this thread.

Jobst

Gordon Messmer

4 Jul 4 Jul

5:46 p.m.

On 7/3/19 11:43 PM, Jobst Schmalenbach wrote:

...

How can it be that the DELL takes so much longer alltough on the far better hardware?

It looks like the DIY system has a CPU that's nearly twice as fast as the Dell's. The additional CPU in the Dell will run more tasks concurrently, but it won't make a single process faster.

You might also think that the SSD RAID would make the Dell faster, but that will only be true if the process that you're testing performs a significant amount of IO. If your DB operations are happening mostly in memory (that is, if the data is cached), then the faster CPU will be the primary determining factor.

The other thing that you left out of your description is the amount of data on each server. If your live server has a lot of data in its DB and the dev system has a small dataset suitable for testing, then generally you'd expect that the dev system's data is more likely to live in cache and avoid disk IO, and processing the smaller set will also take less CPU time.

https://www.cpubenchmark.net/cpu.php?cpu=Intel+Xeon+E3-1245+%40+3.30GHz&...

https://www.cpubenchmark.net/cpu.php?cpu=Intel+Xeon+L5630+%40+2.13GHz&id...

Jobst Schmalenbach

6 Jul 6 Jul

12:52 a.m.

On Thu, Jul 04, 2019 at 10:46:19AM -0700, Gordon Messmer (gordon.messmer@gmail.com) wrote:

...

On 7/3/19 11:43 PM, Jobst Schmalenbach wrote:

...

How can it be that the DELL takes so much longer alltough on the far better hardware?

It looks like the DIY system has a CPU that's nearly twice as fast as the Dell's. The additional CPU in the Dell will run more tasks concurrently, but it won't make a single process faster.

You might also think that the SSD RAID would make the Dell faster, but that will only be true if the process that you're testing performs a significant amount of IO. If your DB operations are happening mostly in memory (that is, if the data is cached), then the faster CPU will be the primary determining factor.

I made the buffer pool size on the DELL double the size of the DIY when I started trying to figure out why the speed difference.

...

The other thing that you left out of your description is the amount of data on each server. If your live server has a lot of data in its DB and the dev system has a small dataset suitable for testing, then generally you'd expect that the dev system's data is more likely to live in cache and avoid disk IO, and processing the smaller set will also take less CPU time.

Most of the DB's are small as they contain websites. The biggest DB is the Online Training DB, which are the same on both machine as I constantly copy the data from the life server to the DIY.

Very good analysis indeed. Makes total sense.

-- Jobst Schmalenbach Road to hell is paved with NAND gates.

Roberto Ragusa

2:26 p.m.

On 7/6/19 2:52 AM, Jobst Schmalenbach wrote:

...

The biggest DB is the Online Training DB, which are the same on both machine as I constantly copy the data from the life server to the DIY.

Could you try the same operations on COPIES of the databases, on both machines? An original live DB can be slower than a copy, because of data structure fragmentation, garbage collections etc. (on the filesystem, but also in tables)

Just a thought about another thing to try, since we have established that the production hardware is indeed faster.

Regards.

-- Roberto Ragusa mail at robertoragusa.it

Steven Tardy

5 Jul 5 Jul

5:18 a.m.

On Thu, Jul 4, 2019 at 2:43 AM Jobst Schmalenbach jobst@barrett.com.au wrote:

...

the development and life server in question run the same software setup:

CentOS Linux release 7.6.1810

bind 32:9.9.4-74.el7_6.1

Apache/2.4.6 (CentOS)

PHP 7.1.29

mysqld Ver 5.7.26

wordpress, woocommerce, wishlistmember, Sensei etc

software are all in the same stages of updates.

even many of the linux conf files are the same (/etc/host, bind, etc)

the databases are copies/identical

Life server is a Poweredge M710,48GB,2xXeon L5630,LSI Raid1 SSD Dev server is a DIY, GIGABYTE MX31-BS0, 32GB, 1xXeon E3-1245,MDADM RAID0 1TB Seagate Spinners

During normal operations (i.e. display websites, online training courses etc) the DELL displays the websites faster although it sits 1000KM up north in a datacenter on a different network than the local server on the same network as my machine.

Yet the DEV server outshines the DELL when creating a few large custom tables, ie the local server takes 5s while the DELL takes 15s (small tables), more for bigger tables.

I have put microtime() calls before and after certain calls, and it's visibly different: DEV Jul 04 04:57:26 UTC _members took 0.0005459785461425 ms Jul 04 04:57:26 UTC _members took 0.0005321502685546 ms LIFE Jul 04 05:00:36 UTC _members took 0.0014369487762451 ms Jul 04 05:00:36 UTC _members took 0.0013291835784912 ms If I do this 300+ times, the outcome is very different.

So my questions:

How can it be that the DELL takes so much longer alltough on the far

better hardware?

How can it be allthough everything (software/os/plugins) is the same?

This even happens if the DELL is on low load (i.e. middle of the night)

and only serves a few requests.

As others have said the DEV server is a generation newer CPU. For CPU details I often reference Intels “ark” pages:

https://ark.intel.com/content/www/us/en/ark/products/47927/intel-xeon-proces... 12M Cache, 2.13 GHz, 5.86 GT/s Intel® QPI

https://ark.intel.com/content/www/us/en/ark/products/52274/intel-xeon-proces... 8M Cache, 3.30 GHz

The “generations” I mentioned are: Code NameProducts formerly Westmere EP https://ark.intel.com/content/www/us/en/ark/products/codename/54534/westmere-ep.html Code NameProducts formerly Sandy Bridge https://ark.intel.com/content/www/us/en/ark/products/codename/29900/sandy-bridge.html

Westmere systems used DDR at 800/1066MHz. Sandy Bridge systems used DDR at 1066/1333MHz. Not a huge difference, but likely another contributing factor of performance.

I would also look at power settings in the BIOS and c-state settings in the BIOS and OS as disabling c-states (often enabled by default to meet green/energy star compliance) can make a noticeable performance difference.

Hope that helps.

Gordon Messmer

6:48 p.m.

New subject: Have you run "tuned-adm profile throughput-performance" ?

On 7/4/19 10:18 PM, Steven Tardy wrote:

...

I would also look at power settings in the BIOS and c-state settings in the BIOS and OS as disabling c-states (often enabled by default to meet green/energy star compliance) can make a noticeable performance difference.

I'd be surprised if it did, but now that you mention it, I think that we should probably mention more often that CentOS's default performance policy is power-saving, which will cut maximum performance in half. Every physical system running CentOS should have run "tuned-adm profile throughput-performance".

http://jperrin.org/centos/boosting-centos-server-performance/

Gordon Messmer

6:52 p.m.

New subject: Have you run "tuned-adm profile throughput-performance" ?

On 7/5/19 11:48 AM, Gordon Messmer wrote:

...

On 7/4/19 10:18 PM, Steven Tardy wrote:

...
I would also look at power settings in the BIOS and c-state settings in the BIOS and OS as disabling c-states (often enabled by default to meet green/energy star compliance) can make a noticeable performance difference.

I'd be surprised if it did,

I take that back. Disabling power-saving in the firmware probably also disabled CPU frequency scaling, which would prevent CentOS's default policy from scaling the frequency down to its minimum, so I wouldn't be surprised.

Fred Smith

8:46 p.m.

New subject: Have you run "tuned-adm profile throughput-performance" ?

On Fri, Jul 05, 2019 at 11:48:45AM -0700, Gordon Messmer wrote:

...

On 7/4/19 10:18 PM, Steven Tardy wrote:

...
I would also look at power settings in the BIOS and c-state settings in the BIOS and OS as disabling c-states (often enabled by default to meet green/energy star compliance) can make a noticeable performance difference.

I'd be surprised if it did, but now that you mention it, I think that we should probably mention more often that CentOS's default performance policy is power-saving, which will cut maximum performance in half. Every physical system running CentOS should have run "tuned-adm profile throughput-performance".

http://jperrin.org/centos/boosting-centos-server-performance/

Not for my (admittedly dog-like) AcerAspire One netbook, dual core 1.6 GHz Aton with a whopping 2 gigs of RAM.

it would run for a little while, pause for a minute or two while the hard drive went chunka-chunka, then eventually come back to life. not pleasant.

-- ---- Fred Smith -- fredex@fcshome.stoneham.ma.us ----------------------------- But God demonstrates his own love for us in this: While we were still sinners, Christ died for us. ------------------------------- Romans 5:8 (niv) ------------------------------

Pete Biggs

6 Jul 6 Jul

1:52 p.m.

New subject: Have you run "tuned-adm profile throughput-performance" ?

On Fri, 2019-07-05 at 11:48 -0700, Gordon Messmer wrote:

...

On 7/4/19 10:18 PM, Steven Tardy wrote:

...
I would also look at power settings in the BIOS and c-state settings in the BIOS and OS as disabling c-states (often enabled by default to meet green/energy star compliance) can make a noticeable performance difference.

I'd be surprised if it did, but now that you mention it, I think that we should probably mention more often that CentOS's default performance policy is power-saving, which will cut maximum performance in half. Every physical system running CentOS should have run "tuned-adm profile throughput-performance".

I'm a bit confused.

I've just done some quick experiments on an HPC system. It was previously set to whatever the default is and then changed to "throughput-performance". There was no discernible change in computation time for on 8-core job (on a dual 4-core Xeon; don't judge, it's an old system I use for testing) - the overall time for the run was just under an hour for both give or take 10 seconds.

So my question is, would the tuning parameters be expected to make a difference on long-term CPU bound processes? Or does the CPU just go at full speed if necessary? Does it depend on the CPU generation?

I'm perfectly willing to set all my HPC cluster nodes to whatever is necessary to get the best performance, but will changing the profile to a performance one mean that the machine will use more power when idle?

Finally, is there a decent online source where I can read up on what the different tuned profile/parameters mean.

Thanks

2302

Age (days ago)

2304

Last active (days ago)

discuss@lists.centos.org

12 comments

7 participants

tags (0)

participants (7)

Fred Smith
Gordon Messmer
Jobst Schmalenbach
Pete Biggs
Roberto Ragusa
Simon Matter
Steven Tardy