[CentOS] SIGPIPE in assorted apps after "yum update"

Mon Jul 7 01:44:58 UTC 2008
John Hanks <griznog at gmail.com>

Hello,

I have several systems which I recently updated with

yum -y update

to all the latest packages. These systems use yum-priorities and use
the CentOS (priority 1) EPEL (priority 5) and rpmforge (priority 10)
repositories. After the updates, dhcpd stopped working with a SIGPIPE
error which occurs shortly after it attempts to fork into the
background. I worked around that problem by building a new server with
no additional repos, only CentOS and dhcpd works fine on that system.
Since then I have found the problem, or similar problems with a few
more applications. Here is what the tail of an strace of pbs_mom as it
attempts to fork into the background:

listen(5, 512)                          = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 6
setsockopt(6, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(6, {sa_family=AF_INET, sin_port=htons(15003),
sin_addr=inet_addr("0.0.0.0")}, 16) = 0
listen(6, 512)                          = 0
fcntl(4, F_SETLK, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0
clone(Process 23938 attached (waiting for parent)
Process 23938 resumed (parent 23937 ready)
child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x2aaaaad30db0) = 23938
[pid 23937] exit_group(0)               = ?
getsockname(3, 0x7fff6b7728a0, [128])   = -1 ENOTSOCK (Socket
operation on non-socket)
fcntl(3, F_GETFD)                       = 0
dup(3)                                  = 7
fcntl(7, F_SETFD, 0)                    = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 8
close(3)                                = 0
fcntl(8, F_GETFD)                       = 0
dup2(8, 3)                              = 3
fcntl(3, F_SETFD, 0)                    = 0
close(8)                                = 0
write(3, "\25\3\1\0\22\334\362\36\233\253\205\2633\323\322q\4\3T\rxK\210",
23) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
Process 23938 detached


This is pretty much the same thing that happened to dhcpd. In both
cases they applications work fine in debug mode when they don't
attempt to fork, but quietly die when ran normally. A third set of
apps, wrappers for the client part of torque (pbs_mom) do this:

stat("/usr/local/sbin/pbs_iff", {st_mode=S_IFREG|S_ISUID|0755,
st_size=21412, ...}) = 0
pipe([5, 6])                            = 0
clone(Process 24068 attached (waiting for parent)
Process 24068 resumed (parent 24067 ready)
child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x2aaaaad31ce0) = 24068
[pid 24067] close(6)                    = 0
[pid 24067] fcntl(5, F_GETFL)           = 0 (flags O_RDONLY)
[pid 24067] read(5,  <unfinished ...>
[pid 24068] getsockname(3, {sa_family=AF_INET, sin_port=htons(41855),
sin_addr=inet_addr("129.123.148.49")}, [1164321820984213520]) = 0
[pid 24068] getpeername(3, {sa_family=AF_INET, sin_port=htons(636),
sin_addr=inet_addr("129.123.20.92")}, [68719476752]) = 0
[pid 24068] fcntl(3, F_GETFD)           = 0x1 (flags FD_CLOEXEC)
[pid 24068] dup(3)                      = 7
[pid 24068] fcntl(7, F_SETFD, FD_CLOEXEC) = 0
[pid 24068] socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 8
[pid 24068] close(3)                    = 0
[pid 24068] fcntl(8, F_GETFD)           = 0
[pid 24068] dup2(8, 3)                  = 3
[pid 24068] fcntl(3, F_SETFD, 0)        = 0
[pid 24068] close(8)                    = 0
[pid 24068] write(3,
"\25\3\1\0\22\346h\357n\r\17x\374B\312\217\374x\276\311\217\342%", 23)
= -1 EPIPE (Broken pipe)
[pid 24068] --- SIGPIPE (Broken pipe) @ 0 (0) ---
Process 24068 detached
<... read resumed> "", 4)               = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
close(5)                                = 0
wait4(24068, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGPIPE}], 0, NULL) = 24068
close(4)                                = 0
write(2, "No Permission.\n", 15No Permission.
)        = 15
write(2, "qstat: cannot connect to server "..., 63qstat: cannot
connect to server moab.hpc.usu.edu (errno=15007)
) = 63
exit_group(-1)                          = ?

Once again, the app dies after it attempts to fork into the
background. There are other things running on these systems that can
successfully fork and I have been unable to figure out any pattern,
other than if I don't use additional repos then it doesn't seem to
break. That may be coincidental though, I haven't repeated it enough
yet to be certain.

Any hints or suggestions would be appreciated. Unfortunately I noticed
this after deciding it was "safe" to update *all* my machines and so
I'm suffering through a lot of rebuilds/restores because of this.

Thanks,

jbh