Centos/Linux Disk Caching, might be OT in some ways

List overview All Threads
Download

newer

older

Hibernating a laptop with the...

Installing an SSL Cert

Noob Centos Admin

24 Jan 2010 24 Jan '10

5:09 p.m.

I'm trying to optimize some database app running on a CentOS server and wanted to confirm some things about the disk/file caching mechanism.

...

From what I've read, Linux has a Virtual Filesystem layer that sits

between the physical file system and everything else. So no matter what FS is used, applications are still addressing the VFS. Due to this, disk caching is done on an inode/block basis.

I'm assuming that this is still the case in CentOS or am I badly mistaken?

If that is correct, then here is my scenario and hypothesis.

Assuming the server has xxx MB of free memory and the database consist of several tables more than xxx MB in size. So no table will fit entirely into memory. And assuming other processes do not interfere with the caching behaviour or available memory etc.

Given the inode caching behaviour, if the DBMS only access a bunch of inodes that total less than xxx MB, is it therefore likely to be always using the cache, hence faster?

My thought is that if this is the case, then I could likely speed up the application behaviour if I further split the tables into parts that are more frequently accessed, and parts that are unlikely touched.

e.g. the table may currently have rows with 20 fields and total 1KB/row, but very often say only 5/20 fields are used in actual processing. Reading x rows from this table may access x inodes which would not fit into the cache/memory.

However if now I break the table into two parts with those 5 fields into a smaller table, there would be a speed increase since the reading the same x rows from this table would only access 1/x inodes. Further more, these would more likely fit into the disk/memory cache for even faster access.

Or would I simply be duplicating what the DBMS's index files would already be doing and therefore see no improvement?

Show replies by date

nate

24 Jan 24 Jan

6:21 p.m.

Noob Centos Admin wrote:

...

I'm trying to optimize some database app running on a CentOS server and wanted to confirm some things about the disk/file caching mechanism.

If you want a fast database forget about file system caching, use Direct I/O and put your memory to better use - application level caching.

nate

Les Mikesell

25 Jan 25 Jan

1:52 a.m.

nate wrote:

...

Noob Centos Admin wrote:

...
I'm trying to optimize some database app running on a CentOS server and wanted to confirm some things about the disk/file caching mechanism.

If you want a fast database forget about file system caching, use Direct I/O and put your memory to better use - application level caching.

Or - if you are doing something like serving web content that can be a bit stale you might put memcache in front of the DB with a few changes in the client access.

-- Les Mikesell lesmikesell@gmail.com

Noob Centos Admin

11:34 a.m.

Hi,

...

If you want a fast database forget about file system caching, use Direct I/O and put your memory to better use - application level caching.

The web application is written in PHP and runs off MySQL and/or Postgresql. So I don't think I can access the raw disk data directly, nor do I think it would be safe since that bypasses the DBMS's checks.

Christopher Chan

26 Jan 26 Jan

12:19 a.m.

Noob Centos Admin wrote:

...

Hi,

...
If you want a fast database forget about file system caching, use Direct I/O and put your memory to better use - application level caching.

The web application is written in PHP and runs off MySQL and/or Postgresql. So I don't think I can access the raw disk data directly, nor do I think it would be safe since that bypasses the DBMS's checks.

Which is it? mysql or postgresql or both? Have you actually determined that i/o is in fact the bottleneck?

Is the webapp maintaining persistent connections to mysql or is it continually connecting and disconnecting? mysql sucks big time if you have a good few hundred connections being setup and torn down all the time.

Are complicated relationships being stored in postgresql and not in mysql? I do not know how things are now but mysql has a history of only being good for simple selects.

JohnS

5:22 a.m.

On Tue, 2010-01-26 at 08:19 +0800, Christopher Chan wrote:

...

Are complicated relationships being stored in postgresql and not in mysql? I do not know how things are now but mysql has a history of only being good for simple selects.

Selects can get very upity for mysql as in "VIEWS". They can do Concat, Inner Join and Outter among many more things. VIEW myview as SELECT can do some very very logical calcs and predictions. I promise it is not just for simple selects.

John

Christopher Chan

5:41 a.m.

JohnS wrote:

...

On Tue, 2010-01-26 at 08:19 +0800, Christopher Chan wrote:

...
Are complicated relationships being stored in postgresql and not in mysql? I do not know how things are now but mysql has a history of only being good for simple selects.

Selects can get very upity for mysql as in "VIEWS". They can do Concat, Inner Join and Outter among many more things. VIEW myview as SELECT can do some very very logical calcs and predictions. I promise it is not just for simple selects.

By 'being good only for simple selects' I meant performance wise. Which is what this thread is all about - performance. Sure you can make complicated queries on mysql but compared to postgresql they would take quite some time. Again, this is based on stuff in the past. Maybe mysql has improved now.

I am just happy that more stuff started supporting postgresql before the Sun buyout. They would have had some time to mature instead of a frantic 'we need to add/convert to postgresql just in case'. But I will still go for mysql with connection caching if it is just a simple table lookup that needs to be remotely querable.

JohnS

7:30 a.m.

On Tue, 2010-01-26 at 13:41 +0800, Christopher Chan wrote:

...

JohnS wrote:

...
On Tue, 2010-01-26 at 08:19 +0800, Christopher Chan wrote:

...
Are complicated relationships being stored in postgresql and not in mysql? I do not know how things are now but mysql has a history of only being good for simple selects.

Selects can get very upity for mysql as in "VIEWS". They can do Concat, Inner Join and Outter among many more things. VIEW myview as SELECT can do some very very logical calcs and predictions. I promise it is not just for simple selects.

By 'being good only for simple selects' I meant performance wise. Which is what this thread is all about - performance. Sure you can make complicated queries on mysql but compared to postgresql they would take quite some time. Again, this is based on stuff in the past. Maybe mysql has improved now.

Sure, I knew what you meant, but we gonna Bang Heads on your definition of simple selects. I can't compare performance to postgresql but I am willing to bet that mysql can do alot more. Doing something like a "Breadth First" or "Depth First" logical operation, it is sad for me to even say MySQL is faster in that area with predictions than MSSQL. Having said that I really love mssql and sqlce. Now we getting OT.

Great things started to happen with mysql @ version 5 >. Now it's just probally going to wither away. Who really knows?

...

I am just happy that more stuff started supporting postgresql before the Sun buyout. They would have had some time to mature instead of a frantic 'we need to add/convert to postgresql just in case'. But I will still go for mysql with connection caching if it is just a simple table lookup that needs to be remotely querable.

Ross Walker

2:48 p.m.

On Jan 26, 2010, at 2:30 AM, JohnS jses27@gmail.com wrote:

...

On Tue, 2010-01-26 at 13:41 +0800, Christopher Chan wrote:

...
JohnS wrote:

...
On Tue, 2010-01-26 at 08:19 +0800, Christopher Chan wrote:

...
Are complicated relationships being stored in postgresql and not in mysql? I do not know how things are now but mysql has a history of only being good for simple selects.

Selects can get very upity for mysql as in "VIEWS". They can do Concat, Inner Join and Outter among many more things. VIEW myview as SELECT can do some very very logical calcs and predictions. I promise it is not just for simple selects.

By 'being good only for simple selects' I meant performance wise. Which is what this thread is all about - performance. Sure you can make complicated queries on mysql but compared to postgresql they would take quite some time. Again, this is based on stuff in the past. Maybe mysql has improved now.

Sure, I knew what you meant, but we gonna Bang Heads on your definition of simple selects. I can't compare performance to postgresql but I am willing to bet that mysql can do alot more. Doing something like a "Breadth First" or "Depth First" logical operation, it is sad for me to even say MySQL is faster in that area with predictions than MSSQL. Having said that I really love mssql and sqlce. Now we getting OT.

Great things started to happen with mysql @ version 5 >. Now it's just probally going to wither away. Who really knows?

Some really nice things are happening with postgresql as well, you should check it out.

-Ross

Kwan Lowe

3:46 p.m.

On Tue, Jan 26, 2010 at 9:48 AM, Ross Walker rswwalker@gmail.com wrote:

...

Some really nice things are happening with postgresql as well, you should check it out.

This was a great thread. For one, it's interesting to see the approaches you can take to solve an issue. I.e., we can tune the OS to quite a degree, but many times just optimizing a query can lead to much greater improvements. We can also sometimes throw hardware at the problem; and sometimes doing that is faster/cheaper than a heuristic approach :D.

Les Mikesell

3:57 p.m.

On 1/26/2010 9:46 AM, Kwan Lowe wrote:

...

On Tue, Jan 26, 2010 at 9:48 AM, Ross Walkerrswwalker@gmail.com wrote:

...
Some really nice things are happening with postgresql as well, you should check it out.

This was a great thread. For one, it's interesting to see the approaches you can take to solve an issue. I.e., we can tune the OS to quite a degree, but many times just optimizing a query can lead to much greater improvements. We can also sometimes throw hardware at the problem; and sometimes doing that is faster/cheaper than a heuristic approach :D.

Or, if the issue is scaling up the number of reads for web access you might be able to simply add caching in the web app, either locally or with a distributed tool like memcache to offload the work from the DB.

-- Les Mikesell lesmikesell@gmail.com

JohnS

27 Jan 27 Jan

5:28 a.m.

On Tue, 2010-01-26 at 09:48 -0500, Ross Walker wrote:

...

...
Great things started to happen with mysql @ version 5 >. Now it's just probally going to wither away. Who really knows?

Some really nice things are happening with postgresql as well, you should check it out.

-Ross

--- Yes, I have been having some thoughts for the past few weeks of tearing into it with all that is happening with MySQL.

John

nate

26 Jan 26 Jan

12:38 a.m.

Noob Centos Admin wrote:

...

The web application is written in PHP and runs off MySQL and/or Postgresql. So I don't think I can access the raw disk data directly, nor do I think it would be safe since that bypasses the DBMS's checks.

This is what I use for MySQL (among other things)

log-queries-not-using-indexes long_query_time=3 key_buffer = 50M bulk_insert_buffer_size = 8M table_cache = 1000 sort_buffer_size = 8M read_buffer_size = 4M read_rnd_buffer_size = 8M myisam_sort_buffer_size = 8M thread_cache = 40 query_cache_size = 256M query_cache_type=1 query_cache_limit=20M

default-storage-engine=innodb innodb_file_per_table innodb_buffer_pool_size=20G <-- assumes you have a decent amount of ram, this is the max I can set the buffers with 32G of RAM w/o swapping innodb_additional_mem_pool_size=20M innodb_log_file_size=1999M innodb_flush_log_at_trx_commit=2 innodb_flush_method=O_DIRECT <-- this turns on Direct I/O innodb_lock_wait_timeout=120 innodb_log_buffer_size=13M innodb_open_files=1024 innodb_thread_concurrency=16 sync_binlog=1 set-variable = tmpdir=/var/lib/mysql/tmp <- force tmp to be on the SAN rather than local disk

Running MySQL 5.0.51a (built from SRPMS)

nate

JohnS

25 Jan 25 Jan

12:42 a.m.

On Mon, 2010-01-25 at 01:09 +0800, Noob Centos Admin wrote:

...

e.g. the table may currently have rows with 20 fields and total 1KB/row, but very often say only 5/20 fields are used in actual processing. Reading x rows from this table may access x inodes which would not fit into the cache/memory.

20 feilds or columns is really nothing. BUT That's dependant on the type of data being inserted.

...

However if now I break the table into two parts with those 5 fields into a smaller table, there would be a speed increase since the reading the same x rows from this table would only access 1/x inodes. Further more, these would more likely fit into the disk/memory cache for even faster access.

Ok so break the one table down create 2 or more, then you will have "Joins" & clustered indexes thus slowing you down more possibly. That is greatly dependant on your select, delete, and update scripts.

...

Or would I simply be duplicating what the DBMS's index files would already be doing and therefore see no improvement?

Possibly very correct, but Nate is very correct on how you are accessing the DB ie direct i/o also. Your fastest access come in optimized SPROCS and Triggers and TSQL. Slam enough memory into the server and load it in memory. If speed is what your after why are you worried about VFS? CentOS does support Raw Disk Access (no filesystem).

John

Noob Centos Admin

11:41 a.m.

Hi,

...

20 feilds or columns is really nothing. BUT That's dependant on the type of data being inserted.

20 was an arbitary number :)

...

Ok so break the one table down create 2 or more, then you will have "Joins" & clustered indexes thus slowing you down more possibly. That is greatly dependant on your select, delete, and update scripts.

That was the reason the original develop gave for having these massive rows! Admittedly it is easier to read but when each row also contains text/blob fields, they tend to grow rather big. Some users have been complaining the server seems to be getting sluggish so I'm trying to plan ahead and make changes before it becomes a real problem.

...

Possibly very correct, but Nate is very correct on how you are accessing the DB ie direct i/o also. Your fastest access come in optimized SPROCS and Triggers and TSQL. Slam enough memory into the server and load it in memory.

It's an old server with all slots populated so adding memory is not an option. I thought of doing an image and porting it into a VM on a newer/faster machine. But then at the rate this client's usage growing, I foresee that as simply delaying the inevitable.

...

If speed is what your after why are you worried about VFS? CentOS does support Raw Disk Access (no filesystem).

To be honest, I don't really care about VFS since I didn't know it existed until I started looking up Linux file/disk caching :D

So I assumed that was what PHP and DBMS like MySQL/Postgresql would be working through. It made sense since they wouldn't need to worry about what filesystem was really used.

Ross Walker

2:45 p.m.

On Jan 25, 2010, at 6:41 AM, Noob Centos Admin centos.admin@gmail.com wrote:

...

Hi,

...
20 feilds or columns is really nothing. BUT That's dependant on the type of data being inserted.

20 was an arbitary number :)

...
Ok so break the one table down create 2 or more, then you will have "Joins" & clustered indexes thus slowing you down more possibly. That is greatly dependant on your select, delete, and update scripts.

That was the reason the original develop gave for having these massive rows! Admittedly it is easier to read but when each row also contains text/blob fields, they tend to grow rather big. Some users have been complaining the server seems to be getting sluggish so I'm trying to plan ahead and make changes before it becomes a real problem.

Split the TEXT/BLOB data out of the primary table into tables of their own indexed to the primary table by it's key column.

...

...
Possibly very correct, but Nate is very correct on how you are accessing the DB ie direct i/o also. Your fastest access come in optimized SPROCS and Triggers and TSQL. Slam enough memory into the server and load it in memory.

It's an old server with all slots populated so adding memory is not an option. I thought of doing an image and porting it into a VM on a newer/faster machine. But then at the rate this client's usage growing, I foresee that as simply delaying the inevitable.

Think about distributing the parts to different boxes as necessary. You can start with the DBMS which is the logical candidate.

...

...
If speed is what your after why are you worried about VFS? CentOS does support Raw Disk Access (no filesystem).

To be honest, I don't really care about VFS since I didn't know it existed until I started looking up Linux file/disk caching :D

So I assumed that was what PHP and DBMS like MySQL/Postgresql would be working through. It made sense since they wouldn't need to worry about what filesystem was really used.

On the DBMS backend, give it plenty of memory, good storage for the workload and good networking.

On the Apache/PHP side, look for a good DBMS inter-connect and some PHP caching module and of course enough CPU for the PHP code and network for Apache+DBMS inter-connect.

If you wanted to split it up even more you could look into some sort of PHP distributed cache/processing system and have PHP processed behind Apache.

-Ross

JohnS

11:22 p.m.

On Mon, 2010-01-25 at 09:45 -0500, Ross Walker wrote:

...

On Jan 25, 2010, at 6:41 AM, Noob Centos Admin centos.admin@gmail.com wrote:

...
Hi,

...
20 feilds or columns is really nothing. BUT That's dependant on the type of data being inserted.

20 was an arbitary number :)

...
Ok so break the one table down create 2 or more, then you will have "Joins" & clustered indexes thus slowing you down more possibly. That is greatly dependant on your select, delete, and update scripts.

That was the reason the original develop gave for having these massive rows! Admittedly it is easier to read but when each row also contains text/blob fields, they tend to grow rather big. Some users have been complaining the server seems to be getting sluggish so I'm trying to plan ahead and make changes before it becomes a real problem.

Split the TEXT/BLOB data out of the primary table into tables of their own indexed to the primary table by it's key column.

Would seem like a good idea in Theory only. Your back to all but Basic Joins on the Table now. Instead look at the way your PHP Code is Encoding the BLOB Data and if your really need the speed since now it's MySQL DB, make you own custom C API for mysql to encode the BLOB. The DB can do this like that much faster then your PHP app ever thought of.

...

...
...
Possibly very correct, but Nate is very correct on how you are accessing the DB ie direct i/o also. Your fastest access come in optimized SPROCS and Triggers and TSQL. Slam enough memory into the server and load it in memory.

It's an old server with all slots populated so adding memory is not an option. I thought of doing an image and porting it into a VM on a newer/faster machine. But then at the rate this client's usage growing, I foresee that as simply delaying the inevitable.

Think about distributing the parts to different boxes as necessary. You can start with the DBMS which is the logical candidate.

...
...
If speed is what your after why are you worried about VFS? CentOS does support Raw Disk Access (no filesystem).

To be honest, I don't really care about VFS since I didn't know it existed until I started looking up Linux file/disk caching :D

So I assumed that was what PHP and DBMS like MySQL/Postgresql would be working through. It made sense since they wouldn't need to worry about what filesystem was really used.

On the DBMS backend, give it plenty of memory, good storage for the workload and good networking.

On the Apache/PHP side, look for a good DBMS inter-connect and some PHP caching module and of course enough CPU for the PHP code and network for Apache+DBMS inter-connect.

Make sure PHP is Creating and Tearing down connections on insert and flush() connection.close()

...

If you wanted to split it up even more you could look into some sort of PHP distributed cache/processing system and have PHP processed behind Apache.

You really need a good SQL Book to sit down and read like the one by: Peter Brawley and Arthur Fuller @ artfullsoftware.com, coauthored by the original mysql owners. It is the best one you will get.

Ross Walker

11:51 p.m.

On Jan 25, 2010, at 6:22 PM, JohnS jses27@gmail.com wrote:

...

On Mon, 2010-01-25 at 09:45 -0500, Ross Walker wrote:

...
On Jan 25, 2010, at 6:41 AM, Noob Centos Admin centos.admin@gmail.com wrote:

...
Hi,

...
20 feilds or columns is really nothing. BUT That's dependant on the type of data being inserted.

20 was an arbitary number :)

...
Ok so break the one table down create 2 or more, then you will have "Joins" & clustered indexes thus slowing you down more possibly. That is greatly dependant on your select, delete, and update scripts.

That was the reason the original develop gave for having these massive rows! Admittedly it is easier to read but when each row also contains text/blob fields, they tend to grow rather big. Some users have been complaining the server seems to be getting sluggish so I'm trying to plan ahead and make changes before it becomes a real problem.

Split the TEXT/BLOB data out of the primary table into tables of their own indexed to the primary table by it's key column.

Would seem like a good idea in Theory only. Your back to all but Basic Joins on the Table now.

I'm sure not all transactions will require the BLOB/TEXT fields.

...

Instead look at the way your PHP Code is Encoding the BLOB Data and if your really need the speed since now it's MySQL DB, make you own custom C API for mysql to encode the BLOB. The DB can do this like that much faster then your PHP app ever thought of.

I believe the OP said he was running postgresql.

...

...
...
...
Possibly very correct, but Nate is very correct on how you are accessing the DB ie direct i/o also. Your fastest access come in optimized SPROCS and Triggers and TSQL. Slam enough memory into the server and load it in memory.

It's an old server with all slots populated so adding memory is not an option. I thought of doing an image and porting it into a VM on a newer/faster machine. But then at the rate this client's usage growing, I foresee that as simply delaying the inevitable.

Think about distributing the parts to different boxes as necessary. You can start with the DBMS which is the logical candidate.

...
...
If speed is what your after why are you worried about VFS? CentOS does support Raw Disk Access (no filesystem).

To be honest, I don't really care about VFS since I didn't know it existed until I started looking up Linux file/disk caching :D

So I assumed that was what PHP and DBMS like MySQL/Postgresql would be working through. It made sense since they wouldn't need to worry about what filesystem was really used.

On the DBMS backend, give it plenty of memory, good storage for the workload and good networking.

On the Apache/PHP side, look for a good DBMS inter-connect and some PHP caching module and of course enough CPU for the PHP code and network for Apache+DBMS inter-connect.

Make sure PHP is Creating and Tearing down connections on insert and flush() connection.close()

...
If you wanted to split it up even more you could look into some sort of PHP distributed cache/processing system and have PHP processed behind Apache.

You really need a good SQL Book to sit down and read like the one by: Peter Brawley and Arthur Fuller @ artfullsoftware.com, coauthored by the original mysql owners. It is the best one you will get.

Is it restricted to purely MySQL?

-Ross

JohnS

26 Jan 26 Jan

12:02 a.m.

On Mon, 2010-01-25 at 18:51 -0500, Ross Walker wrote:

...

...
Instead look at the way your PHP Code is Encoding the BLOB Data and if your really need the speed since now it's MySQL DB, make you own custom C API for mysql to encode the BLOB. The DB can do this like that much faster then your PHP app ever thought of.

I believe the OP said he was running postgresql.

Quoted from OPs previous mail hes not sure lol....

"""The web application is written in PHP and runs off MySQL and/or Postgresql."""

John

Ross Walker

12:19 a.m.

On Jan 25, 2010, at 7:02 PM, JohnS jses27@gmail.com wrote:

...

On Mon, 2010-01-25 at 18:51 -0500, Ross Walker wrote:

...
...
Instead look at the way your PHP Code is Encoding the BLOB Data and if your really need the speed since now it's MySQL DB, make you own custom C API for mysql to encode the BLOB. The DB can do this like that much faster then your PHP app ever thought of.

I believe the OP said he was running postgresql.

Quoted from OPs previous mail hes not sure lol....

"""The web application is written in PHP and runs off MySQL and/or Postgresql."""

Ah, well #1 on his list then is to figure out what he is running!

If there are BLOB/TEXT fields those should really be queried separately and put in some cache shared by the PHP app (or discarded depending on how frequently used). Not knowing what they are, probably avatars, which should be cached, as opposed to uploads or such which shouldn't. Who knows without more info.

-Ross

Noob Centos Admin

27 Jan 27 Jan

9:11 a.m.

Hi,

...

...
...
I believe the OP said he was running postgresql.

Quoted from OPs previous mail hes not sure lol....

"""The web application is written in PHP and runs off MySQL and/or Postgresql."""

Ah, well #1 on his list then is to figure out what he is running!

LOL, I know it sounds quite noobish, coming across like I've no idea what DBMS it is running on. The system currently runs on MySQL but part of my update requirement was to decouple the DBMS so that we can make an eventual switch to postgresql.

Hence the solution cannot be dependent on some specific MySQL functionality.

Chan Chung Hang Christopher

12:30 p.m.

...

...
Ah, well #1 on his list then is to figure out what he is running!

LOL, I know it sounds quite noobish, coming across like I've no idea what DBMS it is running on. The system currently runs on MySQL but part of my update requirement was to decouple the DBMS so that we can make an eventual switch to postgresql.

Hence the solution cannot be dependent on some specific MySQL functionality.

mysql's isam tables have a reputation for surviving just about anything and great builtin replication support...

postgresql less so (I suspect due to fake fsync/fsyncdata in the days before barriers) but maybe things have improved a lot nowadays.

Why are you switching?

Noob Centos Admin

12:51 p.m.

MySQL's acquisition was one of the factor, the client wants to keep everything on the opensource side as far as possible.

On the technical side, all tables are using the InnoDB engine because myISAM doesn't support either. Also previously during development, it was discovered that on some particular application/function, MyISAM caused a heavy load that went away after switching to InnoDB.

Also, as part of my idea was to subsequently put the tables on different disks for better improvement. Postgresql supports that while MySQL appears to require all the tables remain on the same filesystem.

There were other considerations that was discussed internally previously but without digging up docs, off hand, these are the key factors I can recall that drove the decision to eventually replace MySQL with Postgresql.

On 1/27/10, Chan Chung Hang Christopher christopher.chan@bradbury.edu.hk wrote:

...

...
...
Ah, well #1 on his list then is to figure out what he is running!

LOL, I know it sounds quite noobish, coming across like I've no idea what DBMS it is running on. The system currently runs on MySQL but part of my update requirement was to decouple the DBMS so that we can make an eventual switch to postgresql.

Hence the solution cannot be dependent on some specific MySQL functionality.

mysql's isam tables have a reputation for surviving just about anything and great builtin replication support...

postgresql less so (I suspect due to fake fsync/fsyncdata in the days before barriers) but maybe things have improved a lot nowadays.

Why are you switching? _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Les Mikesell

1:26 p.m.

Chan Chung Hang Christopher wrote:

...

...
...
Ah, well #1 on his list then is to figure out what he is running!

LOL, I know it sounds quite noobish, coming across like I've no idea what DBMS it is running on. The system currently runs on MySQL but part of my update requirement was to decouple the DBMS so that we can make an eventual switch to postgresql.

Hence the solution cannot be dependent on some specific MySQL functionality.

mysql's isam tables have a reputation for surviving just about anything

...or people just didn't notice the loss of unflushed data since there weren't any relational constraints to break.

-- Les Mikesell lesmikesell@gmail.com

Christopher Chan

28 Jan 28 Jan

1:03 a.m.

On Wednesday, January 27, 2010 09:26 PM, Les Mikesell wrote:

...

Chan Chung Hang Christopher wrote:

...
...
...
Ah, well #1 on his list then is to figure out what he is running!

LOL, I know it sounds quite noobish, coming across like I've no idea what DBMS it is running on. The system currently runs on MySQL but part of my update requirement was to decouple the DBMS so that we can make an eventual switch to postgresql.

Hence the solution cannot be dependent on some specific MySQL functionality.

mysql's isam tables have a reputation for surviving just about anything

...or people just didn't notice the loss of unflushed data since there weren't any relational constraints to break.

Maybe...there is also its convenience of being able to be copied to any os using the same cpu architecture. I certainly do not remember loss of rows after crashes but who knows...

Which version of mysql is GPL anyway?

Ross Walker

27 Jan 27 Jan

2:38 p.m.

On Jan 27, 2010, at 7:30 AM, Chan Chung Hang Christopher <christopher.chan@bradbury.edu.hk

...

wrote:

...

mysql's isam tables have a reputation for surviving just about anything and great builtin replication support...

postgresql less so (I suspect due to fake fsync/fsyncdata in the days before barriers) but maybe things have improved a lot nowadays.

It has changed a lot since then, where it depended on the OS' fsync routine to assure data integrity. Now it uses barriers (where supported).

But if your doing mysql on top of LVM your basically doing the same, cause LVM (other then current kernels) doesn't support barriers.

Still if you have a battery backed write-caching controller that negates the fsync risk, LVM or not, mysql or postgresql.

-Ross

Noob Centos Admin

3:20 p.m.

Hi,

On 1/27/10, Ross Walker rswwalker@gmail.com wrote:

...

But if your doing mysql on top of LVM your basically doing the same, cause LVM (other then current kernels) doesn't support barriers.

Still if you have a battery backed write-caching controller that negates the fsync risk, LVM or not, mysql or postgresql.

This is a bit of a surpise. Am I understanding correctly that running postgresql or mysql on top of LVM negates any data reliability measures the DBMS might have in the event of an unexpected shutdown?

I have several servers configured to run LVM on top of MD1 for the convenience of being able to add more space to a volume in the future. I didn't realize this was a reliability risk. :(

Ross Walker

28 Jan 28 Jan

12:34 a.m.

On Jan 27, 2010, at 10:20 AM, Noob Centos Admin centos.admin@gmail.com wrote:

...

Hi,

On 1/27/10, Ross Walker rswwalker@gmail.com wrote:

...
But if your doing mysql on top of LVM your basically doing the same, cause LVM (other then current kernels) doesn't support barriers.

Still if you have a battery backed write-caching controller that negates the fsync risk, LVM or not, mysql or postgresql.

This is a bit of a surpise. Am I understanding correctly that running postgresql or mysql on top of LVM negates any data reliability measures the DBMS might have in the event of an unexpected shutdown?

I have several servers configured to run LVM on top of MD1 for the convenience of being able to add more space to a volume in the future. I didn't realize this was a reliability risk. :(

Sorry to be the bearer of bad news, but on top of LVM on CentOS/RHEL the best assurance your going to get is fsync(), meaning the data is out of the kernel, but probably still on disk write cache. Make sure you have a good UPS setup, so the disks can flush after main power loss.

-Ross

Christopher Chan

12:50 a.m.

...

Sorry to be the bearer of bad news, but on top of LVM on CentOS/RHEL the best assurance your going to get is fsync(), meaning the data is out of the kernel, but probably still on disk write cache. Make sure you have a good UPS setup, so the disks can flush after main power loss.

Or turn off write caching...

Ross Walker

2:48 p.m.

On Jan 27, 2010, at 7:50 PM, Christopher Chan <christopher.chan@bradbury.edu.hk

...

wrote:

...

...
Sorry to be the bearer of bad news, but on top of LVM on CentOS/RHEL the best assurance your going to get is fsync(), meaning the data is out of the kernel, but probably still on disk write cache. Make sure you have a good UPS setup, so the disks can flush after main power loss.

Or turn off write caching...

Have you tried doing any kind of write with write caching turned off? It is so horribly slow to make it almost useless.

If you need to turn write-caching off then I would start looking at SSD drives with capacitor based caches.

-Ross

Les Mikesell

4:18 p.m.

On 1/28/2010 8:48 AM, Ross Walker wrote:

...

...
...
Sorry to be the bearer of bad news, but on top of LVM on CentOS/RHEL the best assurance your going to get is fsync(), meaning the data is out of the kernel, but probably still on disk write cache. Make sure you have a good UPS setup, so the disks can flush after main power loss.

Or turn off write caching...

Have you tried doing any kind of write with write caching turned off? It is so horribly slow to make it almost useless.

If you need to turn write-caching off then I would start looking at SSD drives with capacitor based caches.

I wonder if the generally-horrible handling that linux has always done for fsync() is the real reason Oracle spun off their own distro? Do they get it better?

-- Les Mikesell lesmikesell@gmail.com

nate

4:37 p.m.

Les Mikesell wrote:

...

I wonder if the generally-horrible handling that linux has always done for fsync() is the real reason Oracle spun off their own distro? Do they get it better?

Anyone in their right mind with Oracle would be using ASM and direct I/O so I don't think it was related.

http://www.oracle.com/technology/pub/articles/smiley_10gdb_install.html#asm http://www.ixora.com.au/tips/avoid_buffered_io.htm

"The file system cache should be used to buffer non-Oracle I/O only. Using it to attempt to enhance the caching of Oracle data just wastes memory, and lots of it. Oracle can cache its own data much more effectively than the operating system can. "

Which leads me back to my original response, forget about file system cache if you want performance go for application level caching whether it's DB caching or other caching like memcached mentioned by someone.

Oracle did it because they wanted to control the entire stack.

nate

Ross Walker

11:54 p.m.

On Jan 28, 2010, at 11:37 AM, "nate" centos@linuxpowered.net wrote:

...

Les Mikesell wrote:

...
I wonder if the generally-horrible handling that linux has always done for fsync() is the real reason Oracle spun off their own distro? Do they get it better?

Anyone in their right mind with Oracle would be using ASM and direct I/O so I don't think it was related.

http://www.oracle.com/technology/pub/articles/smiley_10gdb_install.html#asm http://www.ixora.com.au/tips/avoid_buffered_io.htm

"The file system cache should be used to buffer non-Oracle I/O only. Using it to attempt to enhance the caching of Oracle data just wastes memory, and lots of it. Oracle can cache its own data much more effectively than the operating system can. "

Which leads me back to my original response, forget about file system cache if you want performance go for application level caching whether it's DB caching or other caching like memcached mentioned by someone.

Oracle did it because they wanted to control the entire stack.

It isn't even file system cache that os of concern here, fsync() properly flushes the file systems and buffers out of the kernel. The problem is fsync() doesn't tell the disk drives themselves to flush their buffers.

Even directio by itself won't do the trick, the OS needs to make sure the disk drives empties it's write cache and currently barriers are the only way to make sure of that.

-Ross

nate

11:58 p.m.

Ross Walker wrote:

...

Even directio by itself won't do the trick, the OS needs to make sure the disk drives empties it's write cache and currently barriers are the only way to make sure of that.

Well I guess by the same token nobody in their right mind would run an Oracle DB without a battery backed write cache :)

nate

Ross Walker

29 Jan 29 Jan

12:08 a.m.

On Jan 28, 2010, at 6:58 PM, "nate" centos@linuxpowered.net wrote:

...

Ross Walker wrote:

...
Even directio by itself won't do the trick, the OS needs to make sure the disk drives empties it's write cache and currently barriers are the only way to make sure of that.

Well I guess by the same token nobody in their right mind would run an Oracle DB without a battery backed write cache :)

True, or at a minimum a UPS.

There are patches in the current kernel versions that add barrier support to LVM, but how long till they are backported to RHEL is anyone's guess.

There are concerns that everyone's currently fast performing LVM file systems will suddenly become doggish once barrier support is included and in some cases it will be true. Using a separate SSD device as a journal can help in some cases.

-Ross

Christopher Chan

12:25 a.m.

...

There are concerns that everyone's currently fast performing LVM file systems will suddenly become doggish once barrier support is included and in some cases it will be true. Using a separate SSD device as a journal can help in some cases.

That's only if you are using ext3/ext4 and data=journal. Are SSDs faster than bbu nvram cards?

Ross Walker

12:46 a.m.

On Jan 28, 2010, at 7:25 PM, Christopher Chan <christopher.chan@bradbury.edu.hk

...

wrote:

...

...
There are concerns that everyone's currently fast performing LVM file systems will suddenly become doggish once barrier support is included and in some cases it will be true. Using a separate SSD device as a journal can help in some cases.

That's only if you are using ext3/ext4 and data=journal. Are SSDs faster than bbu nvram cards?

No, but for existing installations it is definitely better then the alternative.

-Ross

Christopher Chan

12:27 a.m.

On Thursday, January 28, 2010 10:48 PM, Ross Walker wrote:

...

On Jan 27, 2010, at 7:50 PM, Christopher Chan<christopher.chan@bradbury.edu.hk

...
wrote:

...
...
Sorry to be the bearer of bad news, but on top of LVM on CentOS/RHEL the best assurance your going to get is fsync(), meaning the data is out of the kernel, but probably still on disk write cache. Make sure you have a good UPS setup, so the disks can flush after main power loss.

Or turn off write caching...

Have you tried doing any kind of write with write caching turned off? It is so horribly slow to make it almost useless.

If they needed the performance in the first place, I doubt they would be using md raid1. You want performance and reliability? Hardware raid + bbu cache. Otherwise, it is turn off write caching unless the i/o path supports barriers.

...

If you need to turn write-caching off then I would start looking at SSD drives with capacitor based caches.

How do those compare with bbu nvram cards for external data + metadata journaling?

Ross Walker

12:50 a.m.

On Jan 28, 2010, at 7:27 PM, Christopher Chan <christopher.chan@bradbury.edu.hk

...

wrote:

...

On Thursday, January 28, 2010 10:48 PM, Ross Walker wrote:

...
On Jan 27, 2010, at 7:50 PM, Christopher Chan<christopher.chan@bradbury.edu.hk

...
wrote:

...
...
Sorry to be the bearer of bad news, but on top of LVM on CentOS/ RHEL the best assurance your going to get is fsync(), meaning the data is out of the kernel, but probably still on disk write cache. Make sure you have a good UPS setup, so the disks can flush after main power loss.

Or turn off write caching...

Have you tried doing any kind of write with write caching turned off? It is so horribly slow to make it almost useless.

If they needed the performance in the first place, I doubt they would be using md raid1. You want performance and reliability? Hardware raid + bbu cache. Otherwise, it is turn off write caching unless the i/o path supports barriers.

Yes, but a lot of people jumped on the SW RAID is just as good or better then HW RAID bandwagon and well there is no battery backed up write cache then.

...

...
If you need to turn write-caching off then I would start looking at SSD drives with capacitor based caches.

How do those compare with bbu nvram cards for external data + metadata journaling?

Slightly slower then nvram, but don't suffer from the write-cache filling up under load.

-Ross

Christopher Chan

2:13 a.m.

On Friday, January 29, 2010 08:50 AM, Ross Walker wrote:

...

On Jan 28, 2010, at 7:27 PM, Christopher Chan<christopher.chan@bradbury.edu.hk

...
wrote:

...
On Thursday, January 28, 2010 10:48 PM, Ross Walker wrote:

...
On Jan 27, 2010, at 7:50 PM, Christopher Chan<christopher.chan@bradbury.edu.hk

...
wrote:

...
...
Sorry to be the bearer of bad news, but on top of LVM on CentOS/ RHEL the best assurance your going to get is fsync(), meaning the data is out of the kernel, but probably still on disk write cache. Make sure you have a good UPS setup, so the disks can flush after main power loss.

Or turn off write caching...

Have you tried doing any kind of write with write caching turned off? It is so horribly slow to make it almost useless.

If they needed the performance in the first place, I doubt they would be using md raid1. You want performance and reliability? Hardware raid + bbu cache. Otherwise, it is turn off write caching unless the i/o path supports barriers.

Yes, but a lot of people jumped on the SW RAID is just as good or better then HW RAID bandwagon and well there is no battery backed up write cache then.

This was so long long ago. Performance-wise, software raid was better than hardware, especially when it comes raid5, when them hardware raid cards were a joke using Intel i960 processors or had little to no cache like the 3ware 75xx/85xx cards. But even then, hardware raid was good for raid1 because at that time, on 3ware anyway, md raid1 did not support partial syncs and had to do full syncs if a disk fell off the array but hardware raid implementations did support and had less of a penalty.

You will only find software raid faster than hardware raid for small setups, 4 to 6 disks, with raid1+0 today before bus contention gets in the way unless you have a motherboard like the one the Thumper uses. Until RHEL6 comes out though, that would be moot for data integrity unless you have a nice big umem nvram card and turn write caching off if you use lvm too.

...

...
...
If you need to turn write-caching off then I would start looking at SSD drives with capacitor based caches.

How do those compare with bbu nvram cards for external data + metadata journaling?

Slightly slower then nvram, but don't suffer from the write-cache filling up under load.

Dunno about the nvram card filling up...depends on the usage. Ext3/4 filesystems being used as mail queues will be laughing with an external nvram device for its journal in data=journal mode unless most mails actually have to queue or are not sent off within five seconds (which is still adjustable i believe).

Noob Centos Admin

27 Jan 27 Jan

9:07 a.m.

Hi,

...

Split the TEXT/BLOB data out of the primary table into tables of their own indexed to the primary table by it's key column.

This is part of what I was planning to do, there are a lot of stuff I am planning to split out into their own tables with reference key. The problem is I'm unsure whether the added overheads of joins would negate the IO benefits hence trying to figure out more about how Centos/Linux does the caching.

...

Think about distributing the parts to different boxes as necessary. You can start with the DBMS which is the logical candidate.

Eventually I figured that would probably have to be done but I don't know enough at this point. So I'm taking the approach of optimizing stage by stage starting with things I'm more familiar with and less likely to muck up totally, i.e.from the app/script side first. Then after getting more familiar with the setup, experiment with the hardware based solutions.

...

On the DBMS backend, give it plenty of memory, good storage for the workload and good networking.

Again problem is old server so memory is maxed, drives controller is probably not helping.

...

On the Apache/PHP side, look for a good DBMS inter-connect and some PHP caching module and of course enough CPU for the PHP code and network for Apache+DBMS inter-connect.

If you wanted to split it up even more you could look into some sort of PHP distributed cache/processing system and have PHP processed behind Apache.

Thanks for the heads up, I didn't realize it was possible to separate the PHP processing from Apache itself. However, for the time being, I'm probably still limited to a single server situation so will keep this in mind for future.

Ross Walker

2:30 p.m.

On Jan 27, 2010, at 4:07 AM, Noob Centos Admin centos.admin@gmail.com wrote:

...

Hi,

...
Split the TEXT/BLOB data out of the primary table into tables of their own indexed to the primary table by it's key column.

This is part of what I was planning to do, there are a lot of stuff I am planning to split out into their own tables with reference key. The problem is I'm unsure whether the added overheads of joins would negate the IO benefits hence trying to figure out more about how Centos/Linux does the caching.

The idea behind it is you don't need to execute a join if you don't need the extra data.

...

...
Think about distributing the parts to different boxes as necessary. You can start with the DBMS which is the logical candidate.

Eventually I figured that would probably have to be done but I don't know enough at this point. So I'm taking the approach of optimizing stage by stage starting with things I'm more familiar with and less likely to muck up totally, i.e.from the app/script side first. Then after getting more familiar with the setup, experiment with the hardware based solutions.

Good approach, take a look at the queries and indexes first, you can get a lot of optimization out of tuning queries and/or adding/re- indexing indexes

...

...
On the DBMS backend, give it plenty of memory, good storage for the workload and good networking.

Again problem is old server so memory is maxed, drives controller is probably not helping.

...
On the Apache/PHP side, look for a good DBMS inter-connect and some PHP caching module and of course enough CPU for the PHP code and network for Apache+DBMS inter-connect.

If you wanted to split it up even more you could look into some sort of PHP distributed cache/processing system and have PHP processed behind Apache.

Thanks for the heads up, I didn't realize it was possible to separate the PHP processing from Apache itself. However, for the time being, I'm probably still limited to a single server situation so will keep this in mind for future.

I was actually thinking of distributing the caching of the data rather then the PHP processing, but you can have multiple PHP front-end servers, one or two mid-line caching (and possibly pre-processing) servers and then a couple of backend DB servers (replicas) for reads and a master for writes.

You could even have all this within a single piece of highly redundant hardware running ESXi or even Xen with PV domains.

If you get a second piece of highly redundant hardware you can then look at vmotion or live migration to distribute the load between boxes and setup fail-over, etc.

-Ross

Les Mikesell

4:10 p.m.

On 1/27/2010 8:30 AM, Ross Walker wrote:

...

...
This is part of what I was planning to do, there are a lot of stuff I am planning to split out into their own tables with reference key. The problem is I'm unsure whether the added overheads of joins would negate the IO benefits hence trying to figure out more about how Centos/Linux does the caching.

The idea behind it is you don't need to execute a join if you don't need the extra data.

I've seen mysql do some really stupid things, like a full 3-table join into a (huge)disk temporary table when the select had a 'limit 10' and was ordered by one of the fields that had an index.

...

...
...
If you wanted to split it up even more you could look into some sort of PHP distributed cache/processing system and have PHP processed behind Apache.

Thanks for the heads up, I didn't realize it was possible to separate the PHP processing from Apache itself. However, for the time being, I'm probably still limited to a single server situation so will keep this in mind for future.

I was actually thinking of distributing the caching of the data rather then the PHP processing, but you can have multiple PHP front-end servers, one or two mid-line caching (and possibly pre-processing) servers and then a couple of backend DB servers (replicas) for reads and a master for writes.

memcache is still the quick-fix here. You can distribute the cache across any nearby machines regardless of whether or not you run php there. And you can often cache higher level objects like parts of the page that might be reused to offload even more than the database activity.

-- Les Mikesell lesmikesell@gmail.com

JohnS

28 Jan 28 Jan

4:29 a.m.

On Wed, 2010-01-27 at 10:10 -0600, Les Mikesell wrote:

...

I've seen mysql do some really stupid things, like a full 3-table join into a (huge)disk temporary table when the select had a 'limit 10' and was ordered by one of the fields that had an index.

Very true. You can you use logic operations to build up "Tree Sets" as Temp Tables on disk and in RAM using Inner and Outer Joins and Index them the way you see fit. This I have done on Innodb. These methods work on many "List" types like Edge and Adjacency Modling. I did all the hard work on mssql then to innodb I went, which now was a big mistake. Basically it comes to a select, then the select gets built up into the rational you need of a temp table structure tree.

John

5745

Age (days ago)

5750

Last active (days ago)

discuss@lists.centos.org

43 comments

8 participants

tags (0)

participants (8)

Chan Chung Hang Christopher
Christopher Chan
JohnS
Kwan Lowe
Les Mikesell
nate
Noob Centos Admin
Ross Walker