This is really strange. Day before, I updated the machine here (x86_64) and all went normal. I rebooted, and everything came back just as I expected. Next morning, I noticed one of my jobs during the night had finished, but finished late by about 8 or 9 minutes. I really didn't think much of it, but after looking at jobs running today, I'm seeing the same thing. Processes that used to take 3.75 hours to complete are now taking 4 hours. ( I have cron jobs that were running to process the data after the main run, and they busted) Nothing has changed in the way the model runs, and there have been no changes to input data or any other parameter that would cause a longer runtime. So, the question is, did something change in the kernel that would make the machine run slower? I do have threading turned on in the bios, and all 4 cpu's are shown via dmesg. In fact, I looked at the dmesg output after a reboot and due to another message about losing clock tics, I just checked that I had not encountered the same. Short of rebooting back into the previous kernel, is there any way to tell if something is slowing the box down?
On Thu, 2006-06-01 at 15:54 -0400, Sam Drinkard wrote:
This is really strange. Day before, I updated the machine here (x86_64) and all went normal. I rebooted, and everything came back just as I expected. Next morning, I noticed one of my jobs during the night had finished, but finished late by about 8 or 9 minutes. I really didn't think much of it, but after looking at jobs running today, I'm seeing the same thing. Processes that used to take 3.75 hours to complete are now taking 4 hours. ( I have cron jobs that were running to process the data after the main run, and they busted) Nothing has changed in the way the model runs, and there have been no changes to input data or any other parameter that would cause a longer runtime. So, the question is, did something change in the kernel that would make the machine run slower? I do have threading turned on in the bios, and all 4 cpu's are shown via dmesg. In fact, I looked at the dmesg output after a reboot and due to another message about losing clock tics, I just checked that I had not encountered the same. Short of rebooting back into the previous kernel, is there any way to tell if something is slowing the box down?
There is an issue with VM configuration that might cause swapping and that COULD slow the machine down.
However ... since it was an "Important kernel securtity update" ... I would at least read this before booting into the old kernel:
https://rhn.redhat.com/errata/RHSA-2006-0493.html
Here is details concerning the VM issue: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=188141
I reality that bug started last kernel ... but I have noticed more on this one.
Johnny,
After reading about the VM issue, I concur. The previous kernel did exhibit the behavior noted in the URL you sent. Looking right now, I see a considerable difference in use / swap. The software running on the machine generally uses anwhere from 1.2 > 1.4Gb of memory, and I had noticed swap would, over time, increase, while there was still *some* memory available, but it never would use totally all of it. The application running right now is a little different from the one that runs late-night and after 16z, so I'll take a peek at it either tonight if I'm up or in the a.m. after the thing starts. I'll also note swap and its size as time progresses to see if it increases as before with free mem.
Thanks for the info.
On Thu, 2006-06-01 at 19:18 -0400, Sam Drinkard wrote:
Johnny,
After reading about the VM issue, I concur. The previous kernel did
exhibit the behavior noted in the URL you sent. Looking right now, I see a considerable difference in use / swap. The software running on the machine generally uses anwhere from 1.2 > 1.4Gb of memory, and I had noticed swap would, over time, increase, while there was still *some* memory available, but it never would use totally all of it. The application running right now is a little different from the one that runs late-night and after 16z, so I'll take a peek at it either tonight if I'm up or in the a.m. after the thing starts. I'll also note swap and its size as time progresses to see if it increases as before with free mem.
I forget the start of this thread... was 64 bit only? Anyway, I had my 32bit lock a couple times with symptoms like Sam mentions. Lots of swap used and no reason for it. Was using lots of open browsers, a couple different GUI MUAs, etc.
Turned off things I didn't need for this workstation behind firewall on cable, like sendmail, spamc/d (started by evolution as it needs regardless of system started), etc.
A *biggie*, maybe, is the stupid readahead and readahead_early stuff. Take a look at their file lists. Some small % suits your needs, the rest is just someone's idea of every possible thing that might speed up initial response. I completely disable these and have seen no difference. As I expected, after initial boot, most of it is wasted and the "non-manual" memory management does a better job than someone who probably got told "Make our boot faster than Windoze".
Up now for 6 days running similar load (I think, haven't bothered to really measure it) and swap use is still good and response is still good.
I suspect the heavy duty servers a lot of you have can also live without these readahead* things. Put a stopwatch on it and see. YMMV.
Thanks for the info.
HTH
William L. Maltby wrote:
On Thu, 2006-06-01 at 19:18 -0400, Sam Drinkard wrote:
Johnny,
After reading about the VM issue, I concur. The previous kernel did exhibit the behavior noted in the URL you sent. Looking right now, I see a considerable difference in use / swap. The software running on the machine generally uses anwhere from 1.2 > 1.4Gb of memory, and I had noticed swap would, over time, increase, while there was still *some* memory available, but it never would use totally all of it. The application running right now is a little different from the one that runs late-night and after 16z, so I'll take a peek at it either tonight if I'm up or in the a.m. after the thing starts. I'll also note swap and its size as time progresses to see if it increases as before with free mem.
I forget the start of this thread... was 64 bit only? Anyway, I had my 32bit lock a couple times with symptoms like Sam mentions. Lots of swap used and no reason for it. Was using lots of open browsers, a couple different GUI MUAs, etc.
Turned off things I didn't need for this workstation behind firewall on cable, like sendmail, spamc/d (started by evolution as it needs regardless of system started), etc.
A *biggie*, maybe, is the stupid readahead and readahead_early stuff. Take a look at their file lists. Some small % suits your needs, the rest is just someone's idea of every possible thing that might speed up initial response. I completely disable these and have seen no difference. As I expected, after initial boot, most of it is wasted and the "non-manual" memory management does a better job than someone who probably got told "Make our boot faster than Windoze".
Up now for 6 days running similar load (I think, haven't bothered to really measure it) and swap use is still good and response is still good.
I suspect the heavy duty servers a lot of you have can also live without these readahead* things. Put a stopwatch on it and see. YMMV.
After uptime of a little over 3 days now, I'm not seeing any increase in swap as I did prior, but then again, I don't exactly recall how long it took before swap started increasing. One thing I do notice is for some reason, the applications that nomally would consume between 1.2 - 1.4G of memory is now down to 1.1 - 1.2, and swap is only lightly touched at 181mb. Don't know if this is due to the kernal update or what, but it would seem logical that it is, since I see different behavior than previously. I've not turned off readahead or readahead_early yet, but will do so shortly and see if I can tell any difference in model run times which did increase after the kernel update. One thing at a time I guess :-)
Sam
On Sun, 2006-06-04 at 09:57 -0400, Sam Drinkard wrote:
William L. Maltby wrote:
On Thu, 2006-06-01 at 19:18 -0400, Sam Drinkard wrote:
Johnny,
After reading about the VM issue, I concur. <snip>
I forget the start of this thread... was 64 bit only? Anyway, I had my 32bit lock a couple times with symptoms like Sam mentions. Lots of swap used and no reason for it. Was using lots of open browsers, a couple different GUI MUAs, etc.
Turned off things I didn't need <snip>
A *biggie*, maybe, is the stupid readahead and readahead_early stuff.
<snip>
Up now for 6 days running similar load (I think, haven't bothered to really measure it) and swap use is still good and response is still good.
<snip>
After uptime of a little over 3 days now, I'm not seeing any increase in swap as I did prior, but then again, I don't exactly recall how long it took before swap started increasing.
For my symptoms to appear, it took a combo of time up and an increase in load. Don't know if this is coincidence or cause and effect.
One thing I do notice is for some reason, the applications that nomally would consume between 1.2 - 1.4G of memory is now down to 1.1 - 1.2, and swap is only lightly touched at 181mb.
Again, mine is only a workstation, so I don't know how this applies to you. My max on the previous session was 64k into swap. That was about 4 days up, IIRC and I tried at several points to dupe the conditions (opened several browsers, multiple users, several different type/instances of MUAs, several ssh to my LAN server, etc.).
I've since turned sendmail back on so I can do some LAN-internal things with it. Pretty much stock config except that I removed the restriction on localhost (a DAEMON_OPTION in the sendmail.mc file). Needed to reboot to to test LVM config changes (new stuff for me) and after a day and a half of running, 0k in swap with medium load.
Don't know if this is due to the kernal update or what, but it would seem logical that it is, since I see different behavior than previously. I've not turned off readahead or readahead_early yet, but will do so shortly and see if I can tell any difference in model run times which did increase after the kernel update. One thing at a time I guess :-)
Theoretically, there ought to be a decrease in startup times at certain points _early_in_the_uptime_cycle. IMO, after a "steady state" typical loading is achieved (hours, day, weeks??) there should be no or reduced improvement (maybe decreased too *if* this forces some spurious swap activity?).
*If* your config is anything similar to mine (or any generic?), I doubt you'll find any long-term gain significant enough to warrant even the near-zero maintenance of two more scripts and associated files.
It's the old "It doesn't cost me anything, but I get no benefit either" routine. I almost always opt for excision of the wart. Sometimes that costs later, but that's OK by me.
Sam
William L. Maltby wrote:
On Sun, 2006-06-04 at 09:57 -0400, Sam Drinkard wrote:
William L. Maltby wrote:
On Thu, 2006-06-01 at 19:18 -0400, Sam Drinkard wrote:
Johnny,
After reading about the VM issue, I concur. <snip>
I forget the start of this thread... was 64 bit only? Anyway, I had my 32bit lock a couple times with symptoms like Sam mentions. Lots of swap used and no reason for it. Was using lots of open browsers, a couple different GUI MUAs, etc.
Turned off things I didn't need <snip>
A *biggie*, maybe, is the stupid readahead and readahead_early stuff.
<snip>
Up now for 6 days running similar load (I think, haven't bothered to really measure it) and swap use is still good and response is still good.
<snip>
After uptime of a little over 3 days now, I'm not seeing any increase in swap as I did prior, but then again, I don't exactly recall how long it took before swap started increasing.
For my symptoms to appear, it took a combo of time up and an increase in load. Don't know if this is coincidence or cause and effect.
One thing I do notice is for some reason, the applications that nomally would consume between 1.2 - 1.4G of memory is now down to 1.1 - 1.2, and swap is only lightly touched at 181mb.
Again, mine is only a workstation, so I don't know how this applies to you. My max on the previous session was 64k into swap. That was about 4 days up, IIRC and I tried at several points to dupe the conditions (opened several browsers, multiple users, several different type/instances of MUAs, several ssh to my LAN server, etc.).
I've since turned sendmail back on so I can do some LAN-internal things with it. Pretty much stock config except that I removed the restriction on localhost (a DAEMON_OPTION in the sendmail.mc file). Needed to reboot to to test LVM config changes (new stuff for me) and after a day and a half of running, 0k in swap with medium load.
Don't know if this is due to the kernal update or what, but it would seem logical that it is, since I see different behavior than previously. I've not turned off readahead or readahead_early yet, but will do so shortly and see if I can tell any difference in model run times which did increase after the kernel update. One thing at a time I guess :-)
Theoretically, there ought to be a decrease in startup times at certain points _early_in_the_uptime_cycle. IMO, after a "steady state" typical loading is achieved (hours, day, weeks??) there should be no or reduced improvement (maybe decreased too *if* this forces some spurious swap activity?).
*If* your config is anything similar to mine (or any generic?), I doubt you'll find any long-term gain significant enough to warrant even the near-zero maintenance of two more scripts and associated files.
It's the old "It doesn't cost me anything, but I get no benefit either" routine. I almost always opt for excision of the wart. Sometimes that costs later, but that's OK by me.
Just before the current model run started, I turned off the readahead and readahead_early, and see a considerable decrease in used memory, down from 1.2G to about 975 +/-. Unfortunately, the model did not initialize on time, and I had to bring it up manualy after some input files were not available, but I can tell by runtime if there has been any improvement. Normal runtime is about 3.75 hours, and *any* reduction in runtime would be great. This machine is set up as a workstation, with X and the whole nine yards going, but there are no entertainment things used, nor word processing, etc. The only mail is what I get from the system too, however sendmail is alive and active. I do have things set if my primary mail machine goes down, I can enable Evolution on it with a mouse click. There are no other users except for an ocassional login from a friend who assists in some of the software running that I am unfamilar with at this point. I do see some processes that could probably be turned off, but as long as I don't start hitting swap and thrashing disks, there probably would be no benefit in real terms to stop these services, and some, I'm not sure I fully understand what all is taking place either. Currently 150 processes and a load aveage of 2.32, 2.10, and 2.04 give or take a little.
We'll see what happens this afternoon when the model completes. I'm hoping for good results.
On Sun, 2006-06-04 at 12:53 -0400, Sam Drinkard wrote:
William L. Maltby wrote:
On Sun, 2006-06-04 at 09:57 -0400, Sam Drinkard wrote:
William L. Maltby wrote:
On Thu, 2006-06-01 at 19:18 -0400, Sam Drinkard wrote:
<snip>
We'll see what happens this afternoon when the model completes. I'm hoping for good results.
Good luck on it. Esp. hope the lock up is gone forever.