This really belongs on a shell list rather than the centos list, but:
On Fri, Nov 25, 2011 at 1:05 PM, Timothy Madden terminatorul@gmail.com wrote:
So I create 20 pipes in my script with `mkfifo´ and connect the read end of each one to a new wget process for that fifo. The write end of each pipe is then connected to my script, with shell commands like `exec 18>>fifo_file_name´
Then my script outputs, in a loop, one line with an URL to each of the pipes, in turn, and then starts over again with the first pipe until there are no more URLs from the database client.
Much to my dismay I find that there is no concurrent / parallel download with the child `wget´ processes, and that for some strange reason only one wget process can download pages at a time, and after that process completes, another one can begin.
I believe the problem is with creating all the fifos and their readers first and then creating the writers.
What happens is that you create wget #1, which has some file descriptors associated with both it and the parent shell.
Next you create wget #2, which (because it was forked from the parent shell) shares all the file descriptors that the shell had open to wget #1, e.g., including the input to the fifo. Repeat for all the rest of the wget. By the time you have created the last one, each of them has a set of descriptors shared with every other that was created ahead of them.
Thus, even though you write to the fifo for wget #2 and close it from the parent shell, it doesn't actually see EOF and begin processing the input until the corresponding descriptor shared by wget #1 is closed when wget #1 exits. wget #3 then doesn't see EOF until #2 exits (#3 would have waited for #1, too, except #1 is already gone by then). Then #4 waits for #3, etc.
So you're either going to need to do a lot more clever descriptor wrangling to make sure wget #1 is not holding open any descriptors visible to wget #2, or you're going to have to use a simpler concurrency scheme that doesn't rely on having all those fifos opened ahead of time.
The child processes also run in parallel if I open the write end of the pipes first, and the start the wget processes for the read end.
Probably you inadvertently resolved the shared open descriptor problem by whatever change you made to the script to invert that ordering.