[CentOS] Pipes (fifos) not working in concurrently

Sat Nov 26 02:34:57 UTC 2011
Bart Schaefer <barton.schaefer at gmail.com>

This really belongs on a shell list rather than the centos list, but:

On Fri, Nov 25, 2011 at 1:05 PM, Timothy Madden <terminatorul at gmail.com> wrote:
>
> So I create 20 pipes in my script with `mkfifo´ and connect the read end of
> each one to a new wget process for that fifo. The write end of each pipe is
> then connected to my script, with shell commands like `exec
> 18>>fifo_file_name´
>
> Then my script outputs, in a loop, one line with an URL to each of the
> pipes, in turn, and then starts over again with the first pipe until there
> are no more URLs from the database client.
>
> Much to my dismay I find that there is no concurrent / parallel download
> with the child `wget´ processes, and that for some strange reason only one
> wget process can download pages at a time, and after that process completes,
> another one can begin.

I believe the problem is with creating all the fifos and their readers
first and then creating the writers.

What happens is that you create wget #1, which has some file
descriptors associated with both it and the parent shell.

Next you create wget #2, which (because it was forked from the parent
shell) shares all the file descriptors that the shell had open to wget
#1, e.g., including the input to the fifo.  Repeat for all the rest of
the wget.  By the time you have created the last one, each of them has
a set of descriptors shared with every other that was created ahead of
them.

Thus, even though you write to the fifo for wget #2 and close it from
the parent shell, it doesn't actually see EOF and begin processing the
input until the corresponding descriptor shared by wget #1 is closed
when wget #1 exits.  wget #3 then doesn't see EOF until #2 exits (#3
would have waited for #1, too, except #1 is already gone by then).
Then #4 waits for #3, etc.

So you're either going to need to do a lot more clever descriptor
wrangling to make sure wget #1 is not holding open any descriptors
visible to wget #2, or you're going to have to use a simpler
concurrency scheme that doesn't rely on having all those fifos opened
ahead of time.

> The child processes also run in parallel if I open the write end of the
> pipes first, and the start the wget processes for the read end.

Probably you inadvertently resolved the shared open descriptor problem
by whatever change you made to the script to invert that ordering.