Hi. A friend of mine was doing real-time video decoding on Fedora Core 13 and he had a performance glitch (1/2 a second freeze) every 5-10 seconds. "top" showed flush-253:0 process at the moment of the freeze.
Major device number 253 corresponds to device-mapper. I advised my friend to re-install his FC13 without LVM, to see if the glitch is related to LVM.
After re-installing FC13 without LVM, he is seeing the glitch every 10 seconds, and it shows flush-8:16 where before it said flush-253:0. 8 is scsi disk driver. So it's not an LVM thing... maybe a kernel thing?
I suggested he try Fedora 14 beta, as it has a newer kernel. Maybe this kernel thing is fixed in the newer kernel.
He also tried CentOS 5.5, and saw a "pdflush" process popping up with the same frequency, and resulting in a similar glitch.
Any other suggestions?
Best, -at
On Mon, 2010-10-18 at 18:25 -0700, Aleksey Tsalolikhin wrote:
Hi. A friend of mine was doing real-time video decoding on Fedora Core 13 and he had a performance glitch (1/2 a second freeze) every 5-10 seconds. "top" showed flush-253:0 process at the moment of the freeze.
And what is the Priority of it running at? How many Cores also?
He also tried CentOS 5.5, and saw a "pdflush" process popping up with the same frequency, and resulting in a similar glitch.
Any other suggestions?
I would not even be concerned. ATM I am seeing pdflush on a server pop every second. load average: 4.51, 3.41, 3.51
I would only be concerned with the Freeze. What's uname -a on the CentOS machine?
John
On Mon, Oct 18, 2010 at 9:08 PM, JohnS jses27@gmail.com wrote:
On Mon, 2010-10-18 at 18:25 -0700, Aleksey Tsalolikhin wrote:
Hi. A friend of mine was doing real-time video decoding on Fedora Core 13 and he had a performance glitch (1/2 a second freeze) every 5-10 seconds. "top" showed flush-253:0 process at the moment of the freeze.
And what is the Priority of it running at? How many Cores also?
He sees this issue at normal priority and at nice -n -19 / -20.
He has 6 cores with hyperthreading on
3.8 Ghz, the memory is 1.850 Mhz
The system is 980x Intel 6 core
He just told me he has two modes for his decoding program, in one mode the system does not write to disk at all, and there are NO GLITCHES doing it this way; another way, it writes lots of little files as it decodes, and the glitch happens actually every 5-20 seconds.
Would like to get to the bottom of this so he can decode with temp files and without glitches.
Cheers, Aleksey
On 10/19/2010 3:34 PM, Aleksey Tsalolikhin wrote:
On Mon, Oct 18, 2010 at 9:08 PM, JohnSjses27@gmail.com wrote:
On Mon, 2010-10-18 at 18:25 -0700, Aleksey Tsalolikhin wrote:
Hi. A friend of mine was doing real-time video decoding on Fedora Core 13 and he had a performance glitch (1/2 a second freeze) every 5-10 seconds. "top" showed flush-253:0 process at the moment of the freeze.
And what is the Priority of it running at? How many Cores also?
He sees this issue at normal priority and at nice -n -19 / -20.
He has 6 cores with hyperthreading on
3.8 Ghz, the memory is 1.850 Mhz
The system is 980x Intel 6 core
He just told me he has two modes for his decoding program, in one mode the system does not write to disk at all, and there are NO GLITCHES doing it this way; another way, it writes lots of little files as it decodes, and the glitch happens actually every 5-20 seconds.
Would like to get to the bottom of this so he can decode with temp files and without glitches.
Ext3 filesystem? Maybe altering the commit option at mount time would help:
http://www.mjmwired.net/kernel/Documentation/filesystems/ext3.txt#49
On Tue, Oct 19, 2010 at 12:48 PM, Toby Bluhm toby.bluhm@alltechmedusa.com wrote:
Ext3 filesystem? Maybe altering the commit option at mount time would help:
http://www.mjmwired.net/kernel/Documentation/filesystems/ext3.txt#49
Good one, Tony! We'll try that. Thanks!!
Aleksey
On Oct 19, 2010, at 5:33 PM, Aleksey Tsalolikhin atsaloli.tech@gmail.com wrote:
On Tue, Oct 19, 2010 at 12:48 PM, Toby Bluhm toby.bluhm@alltechmedusa.com wrote:
Ext3 filesystem? Maybe altering the commit option at mount time would help:
http://www.mjmwired.net/kernel/Documentation/filesystems/ext3.txt#49
Good one, Tony! We'll try that. Thanks!!
You could also reduce the dirty interval in sysctl so it flushes sooner therefore flushes less data each time.
-Ross
Still seeing the glitch every 5-20 secs after remounting with "commit=6000".
On Tue, Oct 19, 2010 at 4:00 PM, Ross Walker rswwalker@gmail.com wrote:
You could also reduce the dirty interval in sysctl so it flushes sooner therefore flushes less data each time.
OK. It's worth a shot. Any idea what the default value is? I'm not sure what value to put in here. I know I want to reduce it but I don't want to break my friend's system either.
http://www.mjmwired.net/kernel/Documentation/sysctl/vm.txt 109 dirty_expire_centisecs 110 111 This tunable is used to define when dirty data is old enough to be eligible 112 for writeout by the pdflush daemons. It is expressed in 100'ths of a second. 113 Data which has been dirty in-memory for longer than this interval will be 114 written out next time a pdflush daemon wakes up.
Thanks, -at
"uname -a" shows:
Linux localhost.localdomain 2.6.18-194.17.1.el5 #1 SMP Wed Sep 29 12:50:31 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
On Oct 19, 2010, at 7:34 PM, Aleksey Tsalolikhin atsaloli.tech@gmail.com wrote:
Still seeing the glitch every 5-20 secs after remounting with "commit=6000".
On Tue, Oct 19, 2010 at 4:00 PM, Ross Walker rswwalker@gmail.com wrote:
You could also reduce the dirty interval in sysctl so it flushes sooner therefore flushes less data each time.
OK. It's worth a shot. Any idea what the default value is? I'm not sure what value to put in here. I know I want to reduce it but I don't want to break my friend's system either.
http://www.mjmwired.net/kernel/Documentation/sysctl/vm.txt
109 dirty_expire_centisecs 110 111 This tunable is used to define when dirty data is old enough to be eligible 112 for writeout by the pdflush daemons. It is expressed in 100'ths of a second. 113 Data which has been dirty in-memory for longer than this interval will be 114 written out next time a pdflush daemon wakes up.
There are several dirty tunables, try a 'sysctl -a | grep dirty'
Try limiting both the amount if dirty memory to hold and the number of seconds to hold it. Defaults are way too liberal and if you are processing a lot of data can both expose you to extreme data loss in a system failure and bottle neck your storage during pdflush.
-Ross
On Tue, 2010-10-19 at 16:34 -0700, Aleksey Tsalolikhin wrote:
Still seeing the glitch every 5-20 secs after remounting with "commit=6000".
On Tue, Oct 19, 2010 at 4:00 PM, Ross Walker rswwalker@gmail.com wrote:
You could also reduce the dirty interval in sysctl so it flushes sooner therefore flushes less data each time.
OK. It's worth a shot. Any idea what the default value is? I'm not sure what value to put in here. I know I want to reduce it but I don't want to break my friend's system either.
http://www.mjmwired.net/kernel/Documentation/sysctl/vm.txt
109 dirty_expire_centisecs 110 111 This tunable is used to define when dirty data is old enough to be eligible 112 for writeout by the pdflush daemons. It is expressed in 100'ths of a second. 113 Data which has been dirty in-memory for longer than this interval will be 114 written out next time a pdflush daemon wakes up.
---- Maybe add in this also vm.dirty_ratio = 50 Start at 0 and work your way up.
John
On Monday, October 18, 2010 09:25:41 pm Aleksey Tsalolikhin wrote:
Hi. A friend of mine was doing real-time video decoding on Fedora Core 13 and he had a performance glitch (1/2 a second freeze) every 5-10 seconds. "top" showed flush-253:0 process at the moment of the freeze.
[snip]
He also tried CentOS 5.5, and saw a "pdflush" process popping up with the same frequency, and resulting in a similar glitch.
Any other suggestions?
What kind of hard drive is this? How is/are your drive(s) set up?
You need the iostat program (in the sysstat package, I think) to give you more detail; there are some pointers to its use in this list's archives.
Thanks, Ross, JohnS and Lamar for your kind responses. It turned out my friend is using 7200 RPM disk for his write-lots-of-little-files activity so we're looking at upgrading that to 15000 RPM or getting him a FusionIO memory card.
Thank you!
Aleksey
On Wed, Oct 20, 2010 at 2:50 PM, Lamar Owen lowen@pari.edu wrote:
On Monday, October 18, 2010 09:25:41 pm Aleksey Tsalolikhin wrote:
Hi. A friend of mine was doing real-time video decoding on Fedora Core 13 and he had a performance glitch (1/2 a second freeze) every 5-10 seconds. "top" showed flush-253:0 process at the moment of the freeze.
[snip]
He also tried CentOS 5.5, and saw a "pdflush" process popping up with the same frequency, and resulting in a similar glitch.
Any other suggestions?
What kind of hard drive is this? How is/are your drive(s) set up?
You need the iostat program (in the sysstat package, I think) to give you more detail; there are some pointers to its use in this list's archives. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos