Hi all,
I have a server with 4GB of memory and a workload that allows it to work mostly from memory without a lot in the way of disk reads. However, every night the backup distrurbs that situation by reading every file on the system.
Is there a way to prevent this from happening? I'd like to be able to tell the kernel, hey this program is going to be reading a bunch of files serially, so please don't cache the data.
Is that possible?
Thanks, Steve
Steve Bergman wrote:
Hi all,
I have a server with 4GB of memory and a workload that allows it to work mostly from memory without a lot in the way of disk reads. However, every night the backup distrurbs that situation by reading every file on the system.
Is there a way to prevent this from happening? I'd like to be able to tell the kernel, hey this program is going to be reading a bunch of files serially, so please don't cache the data.
Is that possible?
I haven't found anything but that would be a nice feature when dealing with a ton of data that would only need to be accessed once(like large backups or large file conversions.
I am rather surprised that this has not been dealt with by now. Why mess with the currently cached information when you know that the files being read will be many times more than your memory/swap space and will only need to be read once.
Adam Gibson wrote:
I am rather surprised that this has not been dealt with by now. Why mess with the currently cached information when you know that the files being read will be many times more than your memory/swap space and will only need to be read once.
It's possible to use the flag
O_DIRECT
http://homepages.cwi.nl/~aeb/linux/man2html/man2/open.2.html
It's up to your backup program if it wants to use the flag or not, but it is there.
-Andy
Andy Green wrote:
Adam Gibson wrote:
It's possible to use the flag
O_DIRECT
Thanks. :-)
In case anyone is interested, below is a patch against the current centos tar-1.14-9.RHEL4 SRPM.
I've tested it and it seems to do what I wanted.
1. Get the SRPM. 2. rpm -i tar-1.14-9.RHEL4.src.rpm 3. cd /usr/src/redhat/SPEC 4. rpmbuild -bp tar.spec 5. cd ../BUILD/tar-1.14/ 6. patch -p1 < /path/to/patchdir/tar_no_cache.patch 7. ./configure 8. make 9. cp src/tar /usr/local/bin/tar_no_cache
tar_no_cache -c -v -f /dev/st0 .
================================================
--- src/create.c.orig 2006-04-24 15:39:00.000000000 -0500 +++ src/create.c 2006-04-24 16:28:13.000000000 -0500 @@ -1373,7 +1373,7 @@ if (file_dumpable_p (st)) { fd = open (st->orig_file_name, - O_RDONLY | O_BINARY); + O_RDONLY | O_BINARY | O_DIRECT); if (fd < 0) { if (!top_level && errno == ENOENT)
Steve Bergman wrote:
Andy Green wrote:
Adam Gibson wrote:
It's possible to use the flag
O_DIRECT
Thanks. :-)
--- src/create.c.orig 2006-04-24 15:39:00.000000000 -0500 +++ src/create.c 2006-04-24 16:28:13.000000000 -0500 @@ -1373,7 +1373,7 @@ if (file_dumpable_p (st)) { fd = open (st->orig_file_name,
O_RDONLY | O_BINARY);
O_RDONLY | O_BINARY | O_DIRECT); if (fd < 0) { if (!top_level && errno == ENOENT)
I don't think this is necessarily safe to do. O_DIRECT adds additional requirements to the memory buffer's alignment and file position alignments. Unless you have audited the 'tar' source code, I think this is a bad idea.
David
David Mansfield wrote:
I don't think this is necessarily safe to do. O_DIRECT adds additional requirements to the memory buffer's alignment and file position alignments. Unless you have audited the 'tar' source code, I think this is a bad idea.
Actually, I was just getting ready to follow up. Although the tar archive structure looks fine, and most of the files are OK, I HAVE FOUND CORRUPTION IN SOME FILES. So don't use that patch.
OK. If that strategy does not work, how about the earlier suggestion of fadvise? Would that be safer? And can anyone provide an example of how fadvise64_64() is actually used? I'm not a C programmer.
-Steve
Steve Bergman wrote:
David Mansfield wrote:
I don't think this is necessarily safe to do. O_DIRECT adds additional requirements to the memory buffer's alignment and file position alignments. Unless you have audited the 'tar' source code, I think this is a bad idea.
Actually, I was just getting ready to follow up. Although the tar archive structure looks fine, and most of the files are OK, I HAVE FOUND CORRUPTION IN SOME FILES. So don't use that patch.
I wonder if the corruption has something to do with needing to ftruncate a file if the file does not precisely end at a block boundary(as the reference in my previous email mentions). I would have expected almost all files to not end on a block boundary so most of them should have been corrupt but most of them were OK you say?
OK. If that strategy does not work, how about the earlier suggestion of fadvise? Would that be safer? And can anyone provide an example of how fadvise64_64() is actually used? I'm not a C programmer.
-Steve
Nice. Looks like the right solution.
David Mansfield wrote:
Steve Bergman wrote:
Andy Green wrote:
Adam Gibson wrote: It's possible to use the flag O_DIRECT
Thanks. :-)
...
I don't think this is necessarily safe to do. O_DIRECT adds additional requirements to the memory buffer's alignment and file position alignments. Unless you have audited the 'tar' source code, I think this is a bad idea. David
The comment by Linus at http://lwn.net/Articles/54041/ and the post at http://www.titov.net/2006/01/02/using-o_largefile-or-o_direct-on-linux/ by someone that mentions something about memory buffer allignment issues when using O_DIRECT does make the patch sound scary. The comment by Linux was from 2003 though so I wonder if the security issues(whatever they are) are still a concern.
"Have you ever noticed that O_DIRECT is a piece of crap?
The interface is fundamentally flawed, it has nasty security issues, it lacks any kind of sane synchronization, and it exposes stuff that shouldn't be exposed to user space.
I hope disk-based databases die off quickly. Yeah, I see where you are working, but where I'm coming from, I see all the _crap_ that Oracle tries to push down to the kernel, and most of the time I go "huh - that's a f**king bad design"." Linus
and
"And now some words about O_DIRECT. If you plan to use O_DIRECT your buffers should be aligned to pagesize (or sector size, depending on Linux version) and you should read/write in multiples of pagesize. To create buffer aligned at pagesize with size BUFFER_SIZE you can use code like this:
pagesize=getpagesize(); realbuff=malloc(BUFFER_SIZE+pagesize); alignedbuff=((((int unsigned)realbuff+pagesize-1)/pagesize)*pagesize);
where pagesize is int and realbuff and alignedbuff are pointers. Realbuff is the pointer you should free() when you finished and alignedbuff is the buffer you shoud use with read/write.
You should also write only by multiples of pagesize. So if you want to create file that is not multiple of pagesize you should ftruncate the file when you finished writing." Anton Titov