I hope someone familiar with the way Linux processes files can enlighten me on the following:
I recently replaced an old Windows 2000 server with a new machine running CentOS 5.2. It uses Samba 3.2.7 to serve a network of Windows XP clients.
We are a newspaper. We use Acrobat Distiller to batch-convert a folder of single-page PostScript files (for print) to a multipage PDF file (for electronic distribution). Running on a workstation, Distiller watches the folder on a Samba share and does the conversion, automatically creating bookmarks, indexes and other information.
On the Windows server, Distiller processes the files by filename order:
M09010901A001C.ps M09010901A002C.ps M09010901A003C.ps
... and so on.
On the Linux server, Distiller processes the files in an order that seems arbitrary, for example:
M09010901A021C.ps M09010901A005C.ps M09010901A015C.ps
... and so on.
The order Distiller uses is NOT related to the time stamp of the files. I tried to copy the files to the watched folder one by one in the correct order; the result is the same.
This creates the need to open the final PDF and reshuffle the pages by hand, which is very time consuming and prone to error.
There is a workaround to this: use the runfilex script that comes with Acrobat: it can contain a list of files to convert, in the order you want. Unfortunately, this is not acceptable for us since the process then takes about 40 minutes (irrespective of platform or filesystem), instead of 3 or 4 minutes.
My question is: how is the order of files determined by Linux when a particular order is not explicitly required by a program?
I noted the following:
I have 4 files in a folder: file1.ps, file2.ps, file3.ps, file4.ps. When I order them by date, they appear in Windows Explorer in, say, the following order: 3, 4, 1, 2 If I copy them to a new folder one by one in the order 1, 2, 3, 4, they will still appear in the order 3, 4, 1, 2 when ordered by date. So, what information is transported with the files that makes the Linux server present them to the world in this order?
Does someone know a workaround to this situation or can someone point me to information about file ordering with Linux? By the way, I am using the EXT3 file system. I tried the same on a VFAT file system and the result is the same. It seems to be a Linux thing, not a file system thing.
Thank you for your patience.
On Thu, 22 Jan 2009 20:28:41 +0000 Miguel Medalha wrote:
My question is: how is the order of files determined by Linux when a particular order is not explicitly required by a program?
http://www.linuxforums.org/forum/linux-newbie/111044-change-order-files-dire...
I have no idea if the script posted there works or not but I found that with a quick google search.
http://www.linuxforums.org/forum/linux-newbie/111044-change-order-files-dire...
I searched Google too, and I read that page. That doesn't work for us: the Windows users won't touch anything on the server (or Linux, for that matter) and I am not there every day. The file names change constantly. I cannot use a cron job because the time at wich the original files are ready is not always the same. This is a newspaper and closure time is very, very busy. When the issue is ready, they proceed to the process I described.
Thank you for answering.
On Thu, 22 Jan 2009 20:46:29 +0000 Miguel Medalha wrote:
I searched Google too, and I read that page. That doesn't work for us: the Windows users won't touch anything on the server (or Linux, for that matter) and I am not there every day.
The Windows users wouldn't have to know that they are "touching" anything on the server. If that script will in fact work and getting it to run at the appropriate time is the only problem, then set up something from the Windows box to trigger it on your server. "Click the pretty icon right here". The pretty icon can set a flag or something on the server that your cron job can check for and run if present.
The Windows users wouldn't have to know that they are "touching" anything on the server. If that script will in fact work and getting it to run at the appropriate time is the only problem, then set up something from the Windows box to trigger it on your server. "Click the pretty icon right here". The pretty icon can set a flag or something on the server that your cron job can check for and run if present.
Ok, that's a good tip. I can investigate that. Thank you.
Miguel Medalha wrote:
I hope someone familiar with the way Linux processes files can enlighten me on the following:
I recently replaced an old Windows 2000 server with a new machine running CentOS 5.2. It uses Samba 3.2.7 to serve a network of Windows XP clients.
We are a newspaper. We use Acrobat Distiller to batch-convert a folder of single-page PostScript files (for print) to a multipage PDF file (for electronic distribution). Running on a workstation, Distiller watches the folder on a Samba share and does the conversion, automatically creating bookmarks, indexes and other information.
On the Windows server, Distiller processes the files by filename order:
M09010901A001C.ps M09010901A002C.ps M09010901A003C.ps
... and so on.
On the Linux server, Distiller processes the files in an order that seems arbitrary, for example:
M09010901A021C.ps M09010901A005C.ps M09010901A015C.ps
... and so on.
The order Distiller uses is NOT related to the time stamp of the files. I tried to copy the files to the watched folder one by one in the correct order; the result is the same.
Programs that read directories on their own normally find files in the order that they happen to appear in the directory. In a newly created directory, that would likely be in the order that the files were added, but in existing directories, slots previously used and now free may be reused in any order and this may not be consistent across filesystem types. If you are processing on the linux side and not via samba, and your program will take a list of files on the command line instead of groveling through the directory itself, you might simply start it with a wild-card filename on the command line. The shell will sort the list as it expands it so programs see the sorted list.
There is a workaround to this: use the runfilex script that comes with Acrobat: it can contain a list of files to convert, in the order you want. Unfortunately, this is not acceptable for us since the process then takes about 40 minutes (irrespective of platform or filesystem), instead of 3 or 4 minutes.
That's very strange. Maybe you should look for a different tool. Won't ghostscript/psutils or OOo do this?
If you are processing on the linux side and not via samba, and your program will take a list of files on the command line instead of groveling through the directory itself, you might simply start it with a wild-card filename on the command line. The shell will sort the list as it expands it so programs see the sorted list.
The processing is done via Samba. Acrobat Distiller is not simply processing a list of files, it is consolidating a group of files onto a single file, discarding repeated graphic objects and creating a single subset of fonts from the various font subsets present on the original pages.
There is a workaround to this: use the runfilex script that comes with Acrobat: it can contain a list of files to convert, in the order you want. Unfortunately, this is not acceptable for us since the process then takes about 40 minutes (irrespective of platform or filesystem), instead of 3 or 4 minutes.
That's very strange. Maybe you should look for a different tool. Won't ghostscript/psutils or OOo do this?
The tools you quote do not apply in this case. I am not talking about office style PDFs, I am talking about full professional PDFs for printing presses, with embedded color profiles such as ISO Newspaper, JPEG2000 compression, bicubic resampling, etc. Not even Ghostscript does that kind of thing. I wish it did, but it doesn't.
Miguel Medalha wrote:
If you are processing on the linux side and not via samba, and your program will take a list of files on the command line instead of groveling through the directory itself, you might simply start it with a wild-card filename on the command line. The shell will sort the list as it expands it so programs see the sorted list.
The processing is done via Samba. Acrobat Distiller is not simply processing a list of files, it is consolidating a group of files onto a single file, discarding repeated graphic objects and creating a single subset of fonts from the various font subsets present on the original pages.
The quick/dirty fix might be to cifs-mount a windows directory where the linux side wants to see it and let the windows side work natively if that gives the behavior you want. Using the automounter might help if the windows side is not always available.
On Thu, Jan 22, 2009 at 3:20 PM, Les Mikesell lesmikesell@gmail.com wrote:
The quick/dirty fix might be to cifs-mount a windows directory where the linux side wants to see it and let the windows side work natively if that gives the behavior you want. Using the automounter might help if the windows side is not always available.
If you go that route, this wiki has useful tips:
http://wiki.centos.org/TipsAndTricks/WindowsShares
See section "3. Even-better method"
Akemi
Akemi Yagi wrote:
On Thu, Jan 22, 2009 at 3:20 PM, Les Mikesell lesmikesell@gmail.com wrote:
The quick/dirty fix might be to cifs-mount a windows directory where the linux side wants to see it and let the windows side work natively if that gives the behavior you want. Using the automounter might help if the windows side is not always available.
If you go that route, this wiki has useful tips:
http://wiki.centos.org/TipsAndTricks/WindowsShares
See section "3. Even-better method"
Or if you want to really go crazy, you might look at Alfresco (http://www.alfresco.com), which among other things implements a cifs server in java so you can apply an assortment of business rules to what the clients see as you might over http - although I don't really know if those rules include control over sort order.
Miguel Medalha wrote:
I hope someone familiar with the way Linux processes files can enlighten me on the following: ... On the Windows server, Distiller processes the files by filename order:
M09010901A001C.ps M09010901A002C.ps M09010901A003C.ps
Windows NTFS uses B-Tree for its directories so they are inherently alphabetically sorted.
On Thu, 2009-01-22 at 14:06 -0800, John R Pierce wrote:
Miguel Medalha wrote:
I hope someone familiar with the way Linux processes files can enlighten me on the following: ... On the Windows server, Distiller processes the files by filename order:
M09010901A001C.ps M09010901A002C.ps M09010901A003C.ps
Windows NTFS uses B-Tree for its directories so they are inherently alphabetically sorted.
If the linux FS is efs2, maybe the "dir_index" option of mke2fs will doo what you want? See "man mke2fs". It says it uses hashed b-trees, but for speed.
<snip>
HTH
I just verified the filesystem features with tune2fs -l and the dir_index feature is already present. So, no luck here.
On Thu, 2009-01-22 at 20:28 +0000, Miguel Medalha wrote:
I hope someone familiar with the way Linux processes files can enlighten me on the following:
I recently replaced an old Windows 2000 server with a new machine running CentOS 5.2. It uses Samba 3.2.7 to serve a network of Windows XP clients.
We are a newspaper. We use Acrobat Distiller to batch-convert a folder of single-page PostScript files (for print) to a multipage PDF file (for electronic distribution). Running on a workstation, Distiller watches the folder on a Samba share and does the conversion, automatically creating bookmarks, indexes and other information.
On the Windows server, Distiller processes the files by filename order:
M09010901A001C.ps M09010901A002C.ps M09010901A003C.ps
... and so on.
On the Linux server, Distiller processes the files in an order that seems arbitrary, for example:
M09010901A021C.ps M09010901A005C.ps M09010901A015C.ps
... and so on.
The order Distiller uses is NOT related to the time stamp of the files. I tried to copy the files to the watched folder one by one in the correct order; the result is the same.
This creates the need to open the final PDF and reshuffle the pages by hand, which is very time consuming and prone to error.
There is a workaround to this: use the runfilex script that comes with Acrobat: it can contain a list of files to convert, in the order you want. Unfortunately, this is not acceptable for us since the process then takes about 40 minutes (irrespective of platform or filesystem), instead of 3 or 4 minutes.
My question is: how is the order of files determined by Linux when a particular order is not explicitly required by a program?
I noted the following:
I have 4 files in a folder: file1.ps, file2.ps, file3.ps, file4.ps. When I order them by date, they appear in Windows Explorer in, say, the following order: 3, 4, 1, 2 If I copy them to a new folder one by one in the order 1, 2, 3, 4, they will still appear in the order 3, 4, 1, 2 when ordered by date. So, what information is transported with the files that makes the Linux server present them to the world in this order?
Does someone know a workaround to this situation or can someone point me to information about file ordering with Linux? By the way, I am using the EXT3 file system. I tried the same on a VFAT file system and the result is the same. It seems to be a Linux thing, not a file system thing.
---- You might want to look closely at the file names in Linux.
Windows is not case sensitive but Linux is.
In Windows, you cannot create the 2 files, TEST.DOC and test.doc in the same directory but in Linux you can. It may be that some of these files are stored differently as in file1.ps and FILE2.PS etc.
Also, you might want to check out some alternate settings...
dos filemode = yes (Share setting only) case sensitive = no (share setting only) default case = lower (share setting only)
Craig
You might want to look closely at the file names in Linux.
Windows is not case sensitive but Linux is.
In Windows, you cannot create the 2 files, TEST.DOC and test.doc in the same directory but in Linux you can. It may be that some of these files are stored differently as in file1.ps and FILE2.PS etc.
Also, you might want to check out some alternate settings...
dos filemode = yes (Share setting only) case sensitive = no (share setting only) default case = lower (share setting only)
I am aware of the differences in case treatment between Linux and Windows. This is not related to case. The filenames in question are automated and ALWAYS take the following form:
M09010901A001C.ps M09010901A002C.ps M09010901A003C.ps
etc, etc.
(By the way, the Samba share settings are not "share only". According to the man pages, share settings can be used globally. The inverse is not true: global settings can only be used globally.)
Thank you for your answer, though.
-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Miguel Medalha Sent: Thursday, January 22, 2009 3:29 PM To: CentOS mailing list; samba@lists.samba.org Subject: [CentOS] OT? File order on CentOS/Samba server
http://code.google.com/p/samba-dirsort-vfs/
Did you try that? I think someone recommended it to you. If it does indeed work which I do not think it will for your situation, send me a personal mail. Although I think your real problem lies in your processing software in the file ordering. I would have a really good look at the software doing it. Why because The Gimp can do this with no problem and it is OSS (file ordering).
JohnStanley
http://code.google.com/p/samba-dirsort-vfs/ Did you try that? I think someone recommended it to you.
Well, I did try to compile it but make fails on all the Linux computers I have access to. They all run CentOS 5.2. It would be nice to have a .rpm... I am a sysadmin, not a programmer, I am not able to solve most compile errors.
(...) think your real problem lies in your processing software in the file ordering. I would have a really good look at the software doing it.
The problem lies in EXT3. I discovered that if I mv the files to another directory the files will then appear on the samba shares in alphanumerical order and will be processed by Acrobat Distiller accordingly. The move can even be done by Windows Explorer working on the Samba share.
This seems a bit strange to me. Why doesn't EXT3 present the files in alphanumerical order after they are first created one by one but then presents them alphanumerically after a bulk move to another directory?
Also, I connected a FAT32 formated USB flash drive to the server and directed Distiller to there. The files are correctly processed at the first trial. I suppose I will install a smallish FAT32 formated IDE disk on the server just for this purpose.
Thank you to all who answered my questions. We form a great community indeed!
On Fri, 23 Jan 2009, Miguel Medalha wrote:
This seems a bit strange to me. Why doesn't EXT3 present the files in alphanumerical order after they are first created one by one but then presents them alphanumerically after a bulk move to another directory?
This sounds to me like the dir_index option was applied to a file system that didn't originally have it and an fsck -Df wasn't run at the time.
Steve
This sounds to me like the dir_index option was applied to a file system that didn't originally have it and an fsck -Df wasn't run at the time.
That may well be the most relevant information given here! I will *certainly* give it a try.
Thank you!
On Fri, 23 Jan 2009, Miguel Medalha wrote:
This sounds to me like the dir_index option was applied to a file system that didn't originally have it and an fsck -Df wasn't run at the time.
That may well be the most relevant information given here! I will *certainly* give it a try.
I based my speculation on some observations I had made on some of my own systems when I implemented dir_index. It so happens that, on that system at least, a "find /foo -print" returns the filenames in sorted order. Unfortunately, it isn't true on another system that I just checked. So now I will go and stand in the corner :(
Steve
I based my speculation on some observations I had made on some of my own systems when I implemented dir_index. It so happens that, on that system at least, a "find /foo -print" returns the filenames in sorted order. Unfortunately, it isn't true on another system that I just checked. So now I will go and stand in the corner :(
:)
Anyway, your tip gave me some precious direction. Monday I will investigate and then report. Thank you!
On Fri, 2009-01-23 at 19:43 +0000, Miguel Medalha wrote:
<snip>
(...) think your real problem lies in your processing software in the file ordering. I would have a really good look at the software doing it.
The problem lies in EXT3. I discovered that if I mv the files to another directory the files will then appear on the samba shares in alphanumerical order and will be processed by Acrobat Distiller accordingly. The move can even be done by Windows Explorer working on the Samba share.
This seems a bit strange to me. Why doesn't EXT3 present the files in alphanumerical order after they are first created one by one but then presents them alphanumerically after a bulk move to another directory?
In addition to the other reply about the dir_index/fsck reply, keep in mind that a typical move (mv dir/* newdir/) will present the list of files in alphanumeric order to the mv/cp command. So regardless of the underlying order in the original directory, the order in the target directory should be alphanumeric.
In that case, I would expect your software, which apparently processes the directory itself, would see the stuff in the new directory in the desired order, as seems to be indicated by your results above.
Also, I connected a FAT32 formated USB flash drive to the server and directed Distiller to there. The files are correctly processed at the first trial. I suppose I will install a smallish FAT32 formated IDE disk on the server just for this purpose.
There has to be a better solution. Maybe the mv as a predecessor to the application processing would be acceptable, presuming the dir_index facility is really not working as hoped?
Thank you to all who answered my questions. We form a great community indeed!
<snip sig stuff>
I still think the dir_index _ought_ to do what you need it to do. But I've never had to depend on it for that purpose so it is just wishful supposition on my part.
I still think the dir_index _ought_ to do what you need it to do. But I've never had to depend on it for that purpose so it is just wishful supposition on my part.
I am now almost certain that dir_index will solve the problem. I already remotely did fsck -fD to that filesystem. Now I will have to wait for monday to do the Distiller stuff.
Thank you.
Miguel Medalha wrote:
I still think the dir_index _ought_ to do what you need it to do. But I've never had to depend on it for that purpose so it is just wishful supposition on my part.
I am now almost certain that dir_index will solve the problem. I already remotely did fsck -fD to that filesystem. Now I will have to wait for monday to do the Distiller stuff.
I thought dir_index worked with a hash of the filename. Without knowing the hash technique I wouldn't assume that the hash sort order would match the unhashed sort order - but it might.
Hi,
On Fri, Jan 23, 2009 at 15:29, Miguel Medalha miguelmedalha@sapo.pt wrote:
I am now almost certain that dir_index will solve the problem. I already remotely did fsck -fD to that filesystem.
I don't really think so... I believe dir_index is the default, your filesystem was probably already created with the dir_index option, and yet your files are out of order. Looking at the man page, it's sorted by the hash of the filename. The purpose is not to present you the files in order, but to make it quicker to open a file in a directory with a huge number of files.
Now I will have to wait for monday to do the Distiller stuff.
You don't necessarily have to wait to see what the Distiller would do. "ls -U" shows the files unsorted, in the directory order, that is probably the order in which the Distiller is using them.
HTH, Filipe
Hi,
You might want to try to look into the Distiller side of things.
1) I believe you are using Rundirex.txt file to convert all the .ps's into one .pdf. This page from Adobe confirms that it will take the files in directory order under Windows:
http://kb.adobe.com/selfservice/viewContent.do?externalId=318674 "-- Acrobat Distiller for Windows will process the files in the order in which you put them into the folder and create the PDF pages in the order in which it processes the files." "-- Acrobat Distiller for Mac OS will process the files in alphabetical order." (one solution would be getting a mac, hehehe).
Strange that you never hit the wrong order problem before, since according to that page, you should...
2) That page also talks about Runfilex.ps file, which is basically the same, only you have to list each .ps file in the order you want them to be included. Any chance you could use this one instead of Rundirex? Is the list of included files fixed? Could the Runfilex.ps file be somehow generated on the server based on the list of files that are there (maybe by a CGI in a web interface) instead of copied by the guy?
3) From what I see, Rundirex.txt (even with a .txt extension) is a Postscript file. AFAIK, Postscript is a full programming language, I've even seen webservers written in Postscript. I'm sure there is a way to sort the list of files from inside Postscript. However, I don't know the language and wouldn't know how to do that, or even how to start looking for it. I searched on the web for someone that did implement this on Rundirex.txt specifically, but with no luck. Maybe someone else on the list will know Postscript, or you could try to look for it in a Postscript list, I'm sure the solution will exist there.
Good luck! And let us know how you fixed it!
Filipe
Hi,
On Fri, Jan 23, 2009 at 20:45, Filipe Brandenburger filbranden@gmail.com wrote:
- Rundirex.txt (even with a .txt extension) is a Postscript file. [...]
[...] way to sort the list of files from inside Postscript.
I think I did it.
Inside your Rundirex.txt, you should have this snippet:
/RunDir { % Uses PathName variable on the operand stack { /mysave save def % Performs a save before running the PS file dup = flush % Shows name of PS file being run RunFile % Calls built in Distiller procedure clear cleardictstack % Cleans up after PS file mysave restore % Restores save level } 255 string filenameforall } def
Right? If so, then add the definition of a bubble sort routine before that (which I got from Wikipedia), and then modify /RunDir into the snippet below. Ghostscript has a .sort built-in that does exaclty that, but I'm including it here as I don't know if Distiller will too.
% Bubble sort from Wikibooks page on PostScript /mybubblesort { 1 index length 1 sub -1 1 { 2 index exch 2 copy get 3 copy % arr proc arr i arr[i] arr i arr[i] 0 1 3 index 1 sub { 3 index 1 index get % arr proc arr i arr[i] arr imax amax j arr[j] 2 index 1 index 10 index exec { % ... amax < arr[j] 4 2 roll } if pop pop } for % arr proc arr i arr[i] arr imax amax 4 -1 roll exch 4 1 roll put put } for pop } bind def
/RunDir { % Uses PathName variable on the operand stack /nf 0 def % Reset counter for number of files { 255 string copy % Copy to a separate string (otherwise would be overwritten) /nf nf 1 add def % Increment counter of number of files } 255 string filenameforall
nf array astore % Put all filenames in an array { lt } mybubblesort % And sort it
{ /mysave save def % Performs a save before running the PS file dup = flush % Shows name of PS file being run RunFile % Calls built in Distiller procedure clear cleardictstack % Cleans up after PS file mysave restore % Restores save level } forall % Execute original procedure, but using sorted array } def
Of course I did not test it with Distiller which I don't have... I did test the part of sorting the list of files with Ghostscript and it works.
Maybe word wrapping in the e-mail will ruin the snippet, if that's the case please let me know and I'll send it attached to you.
Let us know if that works!
Filipe
(...) add the definition of a bubble sort routine before that (which I got from Wikipedia), and then modify /RunDir into the snippet below. (...)
Thank you for caring to look for and post the code.
At first I became very excited about it. But then I tried it...
It does work. The problem is that it suffers from the same illness as runfilex does: it takes forever. The process starts very swiftly but each new processed page takes longer and longer until it all slows to a crawl. Worse yet, Distiller goes on to use enormous (> 90%) amounts of CPU time.
I just measured the process as folllows, for the same set of files, corresponding to a 32 page publication in A3 format:
rundirex: 3m42s runfilex: 1h29m54s Wikipedia code: 1h14m55s
It would be faster with the computers we have at work (runfilex takes about 40m) but you can see the relative magnitudes here. It really is not an option for the stressful environment of a closing newspaper...
I suppose I will end up creating a FAT32 partition on the server just for this purpose.
Thank you again for pointing me to the "PostScript FAQ" Wikipedia page. It reminded me of the times when I was reading it on BBS'es with the help of a 2400 bps modem link... :-)
On Sat, 2009-01-24 at 20:24 +0000, Miguel Medalha wrote:
Thank you again for pointing me to the "PostScript FAQ" Wikipedia page. It reminded me of the times when I was reading it on BBS'es with the help of a 2400 bps modem link... :-)
---- and you thought that 2400 bps was fast too I bet. Having started at 300 bps, I was shocked at how fast 1200 bps was.
that was a couple of eons ago
Craig
and you thought that 2400 bps was fast too I bet. Having started at 300 bps, I was shocked at how fast 1200 bps was.
that was a couple of eons ago
That reminded me that I still used a 1200 one for a while, too. When the first 14,400 modems appeared, I could not believe the speed. The cost was almost that of gold. In fact, they were so expensive that I had to buy one 50-50 with a friend. A ISA internal one because an internal one was a little cheaper. We then shared it: one week for me, one week for him. :-)
Oi Miguel,
On Sat, Jan 24, 2009 at 15:24, Miguel Medalha miguelmedalha@sapo.pt wrote:
Thank you for caring to look for and post the code.
No problem! Glad to help.
At first I became very excited about it. But then I tried it...
It does work. The problem is that it suffers from the same illness as runfilex does: it takes forever. The process starts very swiftly but each new processed page takes longer and longer until it all slows to a crawl. Worse yet, Distiller goes on to use enormous (> 90%) amounts of CPU time.
I just measured the process as folllows, for the same set of files, corresponding to a 32 page publication in A3 format:
rundirex: 3m42s runfilex: 1h29m54s Wikipedia code: 1h14m55s
That is really weird, since it's only sorting a list before starting the processing, but once the processing is started, it does exactly the same in both cases (the only difference is that in one case "filenameforall" is used and in the other case "forall" is used over an array with the sorted list of files).
Do you have a support contract with Adobe? If you do, I think you should bring up this issue with them and try to figure out where the huge performance difference is coming from, since it should not.
I suppose I will end up creating a FAT32 partition on the server just for this purpose.
and:
I just turned dir_index OFF with tune2fs. Now the directory order is the same as the inode order. This makes the order of files predictable and in fact turns out to solve my problem.
With dir_index turned OFF on that filesystem, when a copy is made to another directory (even from Windows on a Samba share) the alphanumeric order is preserved. I will just ask the workstation operators to copy the PS files to a new folder when they are all ready. Distiller is watching that folder and will process the files in the normal way, using the rundirex file.
I don't think turning dir_index off will make the order as predictable as you want it. It may be a good enough work around for now, but it might lead to strange problems in the future that you may end up having to deal with again.
I would really advise you to investigate why when you list the files in the order you want in the input file it takes so long.
Boa Sorte! Filipe
That is really weird, since it's only sorting a list before starting the processing, but once the processing is started, it does exactly the same in both cases (the only difference is that in one case "filenameforall" is used and in the other case "forall" is used over an array with the sorted list of files).
Weird indeed. But repeatable and confirmed again and again. It doesn't happen only over the network. It happens at the local level too, with Distiller and all files on the same workstation.
I don't think turning dir_index off will make the order as predictable as you want it. It may be a good enough work around for now, but it might lead to strange problems in the future that you may end up having to deal with again.
I don't have a choice right now. Newspapers don't wait, you know :)
I would really advise you to investigate why when you list the files in the order you want in the input file it takes so long.
I will certainly investigate it. I am a curious mind, for the worse and for the better :)
Boa Sorte!
Agradeço. Saudações!
-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Filipe Brandenburger Sent: Saturday, January 24, 2009 8:13 PM To: CentOS mailing list Subject: Re: [CentOS] OT? File order on CentOS/Samba server
Oi Miguel,
On Sat, Jan 24, 2009 at 15:24, Miguel Medalha miguelmedalha@sapo.pt wrote:
Thank you for caring to look for and post the code.
No problem! Glad to help.
At first I became very excited about it. But then I tried it...
It does work. The problem is that it suffers from the same
illness as
runfilex does: it takes forever. The process starts very
swiftly but each
new processed page takes longer and longer until it all
slows to a crawl.
Worse yet, Distiller goes on to use enormous (> 90%)
amounts of CPU time.
I just measured the process as folllows, for the same set of files, corresponding to a 32 page publication in A3 format:
rundirex: 3m42s runfilex: 1h29m54s Wikipedia code: 1h14m55s
That is really weird, since it's only sorting a list before starting the processing, but once the processing is started, it does exactly the same in both cases (the only difference is that in one case "filenameforall" is used and in the other case "forall" is used over an array with the sorted list of files).
Do you have a support contract with Adobe? If you do, I think you should bring up this issue with them and try to figure out where the huge performance difference is coming from, since it should not.
I suppose I will end up creating a FAT32 partition on the
server just for
this purpose.
and:
I just turned dir_index OFF with tune2fs. Now the directory
order is the
same as the inode order. This makes the order of files
predictable and
in fact turns out to solve my problem.
With dir_index turned OFF on that filesystem, when a copy is made to another directory (even from Windows on a Samba share) the alphanumeric order is preserved. I will just ask the workstation operators to copy the PS files to a new folder when they are all ready. Distiller is watching that folder and will process
the files in the
normal way, using the rundirex file.
I don't think turning dir_index off will make the order as predictable as you want it. It may be a good enough work around for now, but it might lead to strange problems in the future that you may end up having to deal with again.
I would really advise you to investigate why when you list the files in the order you want in the input file it takes so long.
------ Filipe, it is possible it is taking so long to do a "sort" because when doing it, it caches it on the client side of Distiller also + does it on the Samba Server to. IE; Sorts on Both Sides.
I have had this happen in .Net. When doing a sort in .Net the default is to sort on the client and the server.
JohnStanley
Filipe, it is possible it is taking so long to do a "sort" because when doing it, it caches it on the client side of Distiller also + does it on the Samba Server to. IE; Sorts on Both Sides.
I tried it, several times, on a standalone Windows workstation and the same happens. I am not saying that the sorting takes too much time; the whole process takes too much time.
And please note that it also happens with the runfilex.ps code provided by Adobe, which does not sort but instead presents Distiller with a list of files to process, instead of letting it rely on the dir order. Sorting is not the problem here.
Miguel Medalha wrote:
Filipe, it is possible it is taking so long to do a "sort" because when doing it, it caches it on the client side of Distiller also + does it on the Samba Server to. IE; Sorts on Both Sides.
I tried it, several times, on a standalone Windows workstation and the same happens. I am not saying that the sorting takes too much time; the whole process takes too much time.
And please note that it also happens with the runfilex.ps code provided by Adobe, which does not sort but instead presents Distiller with a list of files to process, instead of letting it rely on the dir order. Sorting is not the problem here.
Sounds like a bug in the program. Maybe it runs a separate instance for each page in that mode and doesn't release any memory until it is all finished. On something smaller or less complex it might not make much difference, but if the memory use pushes into swap it will take much longer.
By the way, yet another really-contorted workaround would be to run VMware server or virtualbox (both free) on the centos box with a windows guest to get a reliable NTFS network drive. If you have resources to spare on this server you could even run distiller there so you could shut down the workstations as soon as the final run starts. It isn't the most efficient way to do things, but I've had some running that way for years with no unexpected problems. The only inconvenience is that at least in the VMware server case on centos 5, whenever you update the kernel and reboot, you have to run a script that recompiles the vmware module before the guest will run.
Sounds like a bug in the program. Maybe it runs a separate instance for each page in that mode and doesn't release any memory until it is all finished. On something smaller or less complex it might not make much difference, but if the memory use pushes into swap it will take much longer.
Yes, that's what it seems to me. As I said before, it starts processing swiftly but soon each new page takes longer and longer until it crawls. CPU time reaches 98% and the memory footprint keeps increasing untill the end of the process. This happens even on a standalone Windows workstation, not only over the network. I can report this to Adobe but I don't have too much hope about the attention such a large company is going to give to such an issue...
By the way, yet another really-contorted workaround would be to run VMware server or virtualbox (both free) on the centos box with a windows guest to get a reliable NTFS network drive. If you have resources to spare on this server you could even run distiller there so you could shut down the workstations as soon as the final run starts.
I thought of doing that but it really is not realistic at the moment in my environment. It is overkill. It would be much easier to put a small FAT32-formated partition on the server just for that purpose. The PS files are not kept. After processing they are discarded, only the resulting PDF is used and archived. For now I will stick with a EXT3 partition with dir_index off and use rundirex like we always did. It works well this way: 3 to 4 minutes to render a complete publication.
Thank you for your tips. Even if I don't use them now, the information stays. Maybe it will be needed one of these days.
Hi,
You might want to try to look into the Distiller side of things.
That's what I always did. I am a DTP guy.
- I believe you are using Rundirex.txt file to convert all the .ps's
into one .pdf. This page from Adobe confirms that it will take the files in directory order under Windows:
http://kb.adobe.com/selfservice/viewContent.do?externalId=318674 "-- Acrobat Distiller for Windows will process the files in the order in which you put them into the folder and create the PDF pages in the order in which it processes the files." "-- Acrobat Distiller for Mac OS will process the files in alphabetical order." (one solution would be getting a mac, hehehe).
Strange that you never hit the wrong order problem before, since according to that page, you should...
Regardless of what that paper says, Distiller has ALWAYS processed the files in alphabetical order under Windows. I have been doing so since 2000 and Acrobat Distiller 4. We are now at 9. I refer, of course, to the use of rundirx.
- That page also talks about Runfilex.ps file, which is basically the
same, only you have to list each .ps file in the order you want them to be included.
I already addressed that on my first post. I tried runfilex.ps but then Distiller takes 30 to 40 minutes to do the same job that it now does in 3 to 4 minutes, which really is not an option for a newspaper at closing time.
I will do some more experiences, from the Distiller side and the Linux side, and I will report here.
Thank you for your answers.
Miguel Medalha wrote:
Regardless of what that paper says, Distiller has ALWAYS processed the files in alphabetical order under Windows. I have been doing so since 2000 and Acrobat Distiller 4. We are now at 9. I refer, of course, to the use of rundirx.
again, Windows NTFS directories are inherently stored in sorted order because they are B-Tree indexes on the filename.
if this distiller process is being run from a "DOS" batch job in Windows, you could perhaps use something like...
for /f %%F in ('dir /b /on *.ps') DO @\path\to\distiller .... %%F ....
to run it on all *.ps files in the current working directory in alphabetic order.
again, Windows NTFS directories are inherently stored in sorted order because they are B-Tree indexes on the filename.
if this distiller process is being run from a "DOS" batch job in Windows, you could perhaps use something like...
for /f %%F in ('dir /b /on *.ps') DO @\path\to\distiller .... %%F ....
to run it on all *.ps files in the current working directory in alphabetic order.
Please note that what Distiller is doing is not "run on all *.ps files in alphabetic order". If only that were the case, I wouldn't be here bothering people... Instructed by a special PS file, Distiller is running a set of complex operations on a group of files in alphabetic order.
I can modify that special PS file to make Distiller process the files in any order I want. The problem is that when the order is not provided by the filesystem itself, the process takes forever. That's why I was looking for a solution at the filesystem level. I was trying to understand the inner workings of EXT3 and looking for a workaround.
Thank you for your tip, though. Maybe some day I will need it.
On Sat, Jan 24, 2009 at 12:43 PM, Miguel Medalha miguelmedalha@sapo.pt wrote:
again, Windows NTFS directories are inherently stored in sorted order because they are B-Tree indexes on the filename.
if this distiller process is being run from a "DOS" batch job in Windows, you could perhaps use something like...
for /f %%F in ('dir /b /on *.ps') DO @\path\to\distiller .... %%F ....
to run it on all *.ps files in the current working directory in alphabetic order.
Please note that what Distiller is doing is not "run on all *.ps files in alphabetic order". If only that were the case, I wouldn't be here bothering people... Instructed by a special PS file, Distiller is running a set of complex operations on a group of files in alphabetic order.
I can modify that special PS file to make Distiller process the files in any order I want. The problem is that when the order is not provided by the filesystem itself, the process takes forever. That's why I was looking for a solution at the filesystem level. I was trying to understand the inner workings of EXT3 and looking for a workaround.
Thank you for your tip, though. Maybe some day I will need it.
Have you tried what the different codepages do to sort order in Samba?
Check out these options:
dos charset unix charset display charset
-Ross
You don't necessarily have to wait to see what the Distiller would do. "ls -U" shows the files unsorted, in the directory order, that is probably the order in which the Distiller is using them.
Yes, Distiller uses the directory order. I made an experience at home. I copied 10 files by hand, one by one, from Windows to a CentOS machine.
Copy order ------------ F08C.ps F06C.ps F03C.ps F05C.ps F10C.ps F02C.ps F07C.ps F04C.ps F01C.ps F09C.ps
I obtained the following results.
EXT3 inode numbers (manually sorted here) match the copy order ----------------------------------------------- 6998658 F08C.ps 6998659 F06C.ps 6998660 F03C.ps 6998661 F05C.ps 6998662 F10C.ps 6998663 F02C.ps 6998664 F07C.ps 6998665 F04C.ps 6998666 F01C.ps 6998667 F09C.ps
EXT3 Directory Order (ls -U1) -------------------------------- F04C.ps F02C.ps F03C.ps F05C.ps F09C.ps F08C.ps F10C.ps F07C.ps F01C.ps F06C.ps
Distiller Order matches Directory order ------------------------- F04C.ps F02C.ps F03C.ps F05C.ps F09C.ps F08C.ps F10C.ps F07C.ps F01C.ps F06C.ps
I see that the directory order does not match the inode order (which is the same as the copy order). Would this be due to the current asynchronous nature of filesystem operations? Let's try that: I will now reboot the server machine with the sync option on filesystem mount. ... Rebooted with sync on that filesystem. Copied the files again to a newly created dir, etc. The results are the same. Why doesn't the directory order reflect the inode order?
Time for further study. Thank you again!
Rebooted with sync on that filesystem. Copied the files again to a newly created dir, etc. The results are the same. Why doesn't the directory order reflect the inode order?
Because of dir_index!
I just turned dir_index OFF with tune2fs. Now the directory order is the same as the inode order.
This makes the order of files predictable and in fact turns out to solve my problem.
With dir_index turned OFF on that filesystem, when a copy is made to another directory (even from Windows on a Samba share) the alphanumeric order is preserved. I will just ask the workstation operators to copy the PS files to a new folder when they are all ready. Distiller is watching that folder and will process the files in the normal way, using the rundirex file.
This solution is even better than the initial situation: since we can now predict the order in which the pages will be processed, we can manipulate the order at will by doing multi-phased copies to the folder, in any order we want, instead of being limited to the alphanumeric one provided by NTFS :-)
So "dir_index ON" (and my ignorance of the inner workings of EXT3) was to blame for this confusion, from the beginning!
What a trip this was (sometimes in circles)! Thank you very much to all who contributed! Great community!
Miguel Medalha wrote:
(...) think your real problem lies in your processing software in the file ordering. I would have a really good look at the software doing it.
The problem lies in EXT3. I discovered that if I mv the files to another directory the files will then appear on the samba shares in alphanumerical order and will be processed by Acrobat Distiller accordingly. The move can even be done by Windows Explorer working on the Samba share.
This seems a bit strange to me. Why doesn't EXT3 present the files in alphanumerical order after they are first created one by one but then presents them alphanumerically after a bulk move to another directory?
Directories grow as they are filled the first time. If you use a shell script with a wildcard to do the move, the shell will sort the list on the command line as it expands it, so the names are linked into the new directory in sorted order. However if you repeat this in the same directory instead of creating new ones each time it may not continue to work as existing empty slots may be reused in a different order.
Also, I connected a FAT32 formated USB flash drive to the server and directed Distiller to there. The files are correctly processed at the first trial. I suppose I will install a smallish FAT32 formated IDE disk on the server just for this purpose.
Did you consider sharing a directory from the machine running distiller and cifs-mounting it on the linux side to get ntfs behavior? Also, I'm curious about the timing of the runs. It doesn't sound like the file operations are grouped atomically. How do you ensure that the whole set is present when distiller starts, or that only one set is present? If I were doing it, I'd probably create a new tmp directory for each set of files (which should fix the ordering as a side effect) and rename it to the expected name after all files are present so you see all of them or none. Or, I might put cygwin sshd on the windows box and use scp or rsync to copy the files over in a batch, then start the Distiller run (if you can start it from the command line).
Did you consider sharing a directory from the machine running distiller and cifs-mounting it on the linux side to get ntfs behavior?
That is out of question. The Windows machines are graphic workstations which are not all connected all the time and the Distiller service is essential to the network.
Also, I'm curious about the timing of the runs. It doesn't sound like the file operations are grouped atomically. How do you ensure that the whole set is present when distiller starts, or that only one set is present?
This is a very peculiar implementation. As I said om my first post, we are a newspaper and, as all newspapers, we don't have a fixed time to close the edition. It closes when it is ready, that's all.
The PDFs for print are automatically produced one by one from PostScript files. The PS files fall on a folder watched by Acrobat Distiller and after being stable for more than 10 seconds the conversion begins. Each one contains only one page, which will then be joined to others to form a plan for a platesetter.
When all the pages have been produced, one of the graphics people places a special text file on a folder watched by Distiller and it begins to bulk process all the individual PS files: downsampling images, converting the color space to sRGB, consolidating font subsets, creating bookmarks and indexes, etc. The result is a multipage PDF for electronic distribution, containing the whole newspaper in the sRGB color space.
This always worked flawlessly until some days ago I replaced the win2k server with a new CentOS/Samba one. Everything worked better and faster except... the pages on this last PDF were in what seemed like an aleatory order. Ordering them by hand is a time consuming and error prone process, specially when everybody is now tired... Producing a newspaper is a pretty tense work, you know.
The difficulty with the scripted solutions proposed here is that we cannot know in advance at what time this process will take place and what the number of pages involved will be. At the end of each issue every minute counts. A watching process would have to poll the status of the workflow for several hours with very small intervals, which would be a waste of processor cicles. And not a very elegant thing to do, I feel.
I am (for now...) convinced that the tip given to me here about dir_index and the use of fsck -fD will solve this problem. Monday I will know. It will be a loooong wait for me.
Thank you again.
Miguel Medalha wrote:
Did you consider sharing a directory from the machine running distiller and cifs-mounting it on the linux side to get ntfs behavior?
That is out of question. The Windows machines are graphic workstations which are not all connected all the time and the Distiller service is essential to the network.
I was under the impression that the Distiller app was running under Windows. If it isn't, it doesn't make much sense for it to expect NTFS filesystem semantics.
When all the pages have been produced, one of the graphics people places a special text file on a folder watched by Distiller and it begins to bulk process all the individual PS files:
[...]
The difficulty with the scripted solutions proposed here is that we cannot know in advance at what time this process will take place and what the number of pages involved will be.
Can't the trigger operation of placing the special text file be replaced by that person starting the script instead (perhaps click a button on a web page or something similar)?
At the end of each issue every minute counts. A watching process would have to poll the status of the workflow for several hours with very small intervals, which would be a waste of processor cicles. And not a very elegant thing to do, I feel.
While I wouldn't call it elegant, filesystem caching makes such things efficient enough that you'll never notice them running. If you need a script that looks for a file to appear or expands a wildcard in a directory, go ahead and use one as long as you can sleep for at least a few seconds in the loop. It's cheaper than having a person rearrange something.
I was under the impression that the Distiller app was running under Windows. If it isn't, it doesn't make much sense for it to expect NTFS filesystem semantics.
Yes, Distiller is running under Windows. When pages start to get ready, one of the graphic operators opens Distiller on his/her workstation which then starts watching a folder *on the server*.
Can't the trigger operation of placing the special text file be replaced by that person starting the script instead (perhaps click a button on a web page or something similar)?
Yes, that would be a possibility. But those people have strong rooted habits and they are not in the least technically minded. As such, I would prefer to keep a workflow that has been functioning very well.
(By the way, that "special text file" is a snippet of PostScript code that instructs Distiller on where to find the files and how to process them. It would be needed anyway.)
Perhaps this obstacle will be removed by applying the correct parameters to the EXT3 file system, as suggested by William Maltby and Steve Thompson above in this thread: mount option "dir_index" followed by a "fsck -Df". I will try this Monday.
Thank you for answering.
On Fri, Jan 23, 2009 at 2:43 PM, Miguel Medalha miguelmedalha@sapo.pt wrote:
http://code.google.com/p/samba-dirsort-vfs/ Did you try that? I think someone recommended it to you.
Well, I did try to compile it but make fails on all the Linux computers I have access to. They all run CentOS 5.2. It would be nice to have a .rpm... I am a sysadmin, not a programmer, I am not able to solve most compile errors.
I will have a hack at compiling it later on because I am very interested in it. If I manage to get it rolling I will send out a mail to you and update the thread here on the list. I have had great success with the clamav vfs module.
JohnStanley
Well, I did try to compile it but make fails on all the Linux computers I have access to. They all run CentOS 5.2. It would be nice to have a .rpm... I am a sysadmin, not a programmer, I am not able to solve most compile errors.
I will have a hack at compiling it later on because I am very interested in it. If I manage to get it rolling I will send out a mail to you and update the thread here on the list. I have had great success with the clamav vfs module.
That would be GREAT! Thank you!
I just turned dir_index OFF with tune2fs. Now the directory order is the same as the inode order.
This makes the order of files predictable and in fact turns out to solve my problem.
With dir_index turned OFF on that filesystem, when a copy is made to another directory (even from Windows on a Samba share) the alphanumeric order is preserved. I will just ask the workstation operators to copy the PS files to a new folder when they are all ready. Distiller is watching that folder and will process the files in the normal way, using the rundirex file.
This solution is even better than the initial situation: since we can now predict the order in which the pages will be processed, we can manipulate the order at will by doing multi-phased copies to the folder, in any order we want, instead of being limited to the alphanumeric one provided by NTFS :-)
So "dir_index ON" (and my ignorance of the inner workings of EXT3) was to blame for this confusion, from the beginning!
What a trip this was (sometimes in circles)! Thank you very much to all who contributed! Great community!