Hi,
I have observed that certain Bazel test cases from the test suite are timing out on CentOS 7 (kernel version 3.10.0-123.el7.x86_64). For example, I tried bazel_coverage_test (using command bazel test //src/test/shell/bazel:bazel_coverage_test) and observed that it just hangs. I tried tracing it using strace (log attached).
This seems to be CentOS specific behavior as I did not observe this on Ubuntu 16.04.
Has anybody observed this? Is this a regression as far as CentOS is concerned?
Thanks, Atul.
On 17/05/17 12:29, Atul Sowani wrote:
Hi,
I have observed that certain Bazel test cases from the test suite are timing out on CentOS 7 (kernel version 3.10.0-123.el7.x86_64). For example, I tried bazel_coverage_test (using command bazel test //src/test/shell/bazel:bazel_coverage_test) and observed that it just hangs. I tried tracing it using strace (log attached).
This seems to be CentOS specific behavior as I did not observe this on Ubuntu 16.04.
Has anybody observed this? Is this a regression as far as CentOS is concerned?
Thanks, Atul.
If your kernel version is 3.10.0-123 then you are *way* backlevel. That is the original 7.0 kernel and the rest of your system is likely to also be at the same 2.5 year old level. You should `yum update` and retest on 7.3.
Trevor
If your kernel version is 3.10.0-123 then you are *way* backlevel. That is the original 7.0 kernel and the rest of your system is likely to also be at the same 2.5 year old level. You should `yum update` and retest on 7.3.
Trevor
I upgraded the kernel version to 4.11.1-1.el7.elrepo.x86_64 #1 SMP Sun May 14 11:54:29 EDT 2017 and performed the tests. However the result remains unchanged - the test hangs. I can provide the complete log file if required.
There is something interesting going on around line 16698 ("[pid 22714] _exit(0)") in the log file excerpt.
[pid 22703] futex(0x7f7dd0000f20, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 22703] sendmsg(8, {msg_name(0)=NULL, msg_iov(3)=[{"PRI * HTTP/2.0\r\n\r\nSM\r\n\r\n", 24}, {"\0\0\22\4\0\0\0\0\0\0\2\0\0\0\0\0\3\0\0\0\0\0\4\0\0\377\377", 27}, {"\0\0\4\10\0\0\0\0\0\0\17\0\1", 13}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 64 [pid 22703] clock_gettime(CLOCK_MONOTONIC, {1155, 662752484}) = 0 [pid 22703] poll([{fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN|POLLOUT}], 3, 999) = 1 ([{fd=8, revents=POLLOUT}]) [pid 22703] clock_gettime(CLOCK_MONOTONIC, {1155, 662807734}) = 0 [pid 22703] poll([{fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}], 3, 999 <unfinished ...> [pid 22714] clock_gettime(CLOCK_MONOTONIC, {1155, 662865341}) = 0 [pid 22714] sendmsg(8, {msg_name(0)=NULL, msg_iov(16)=[{"\0\1\22\1\4\0\0\0\1@\7", 11}, {":scheme", 7}, {"\4", 1}, {"http", 4}, {"@\7", 2}, {":method", 7}, {"\4", 1}, {"POST", 4}, {"@\5", 2}, {":path", 5}, {""", 1}, {"/command_server.CommandServer/Pi"..., 34}, {"@\n", 2}, {":authority", 10}, {"\v[::1]:42219@\r", 14}, {"grpc-encoding", 13}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 118 [pid 22714] sendmsg(8, {msg_name(0)=NULL, msg_iov(16)=[{"\10", 1}, {"identity", 8}, {"@\24", 2}, {"grpc-accept-encoding", 20}, {"\25", 1}, {"identity,deflate,gzip", 21}, {"@\2", 2}, {"te", 2}, {"\10", 1}, {"trailers", 8}, {"@\f", 2}, {"content-type", 12}, {"\20", 1}, {"application/grpc", 16}, {"@\n", 2}, {"user-agent", 10}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 109 [pid 22714] sendmsg(8, {msg_name(0)=NULL, msg_iov(8)=[{"%", 1}, {"grpc-c++/0.13.0 grpc-c/0.13.0 (l"..., 37}, {"@\f", 2}, {"grpc-timeout", 12}, {"\00310S\0\0\4\10\0\0\0\0\1\0\0", 15}, {"\377\377\0\0%\0\1\0\0\0\1\0\0\0\0", 15}, {" ", 1}, {"\n\03646fec02652d766b966d8d8da6f32ae", 32}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 115 [pid 22714] madvise(0x7f7dd6606000, 8368128, MADV_DONTNEED) = 0 [pid 22714] _exit(0) = ? [pid 22703] <... poll resumed> ) = 1 ([{fd=8, revents=POLLIN}]) [pid 22714] +++ exited with 0 +++ recvmsg(8, {msg_name(0)=NULL, msg_iov(1)=[{"\0\0\f\4\0\0\0\0\0\0\3\177\377\377\377\0\4\0\20\0\0\0\0\4\10\0\0\0\0\0\0\17"..., 8192}], msg_controllen=0, msg_flags=0}, 0) = 183 sendmsg(8, {msg_name(0)=NULL, msg_iov(1)=[{"\0\0\0\4\1\0\0\0\0", 9}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 9 recvmsg(8, 0x7ffc26b44450, 0) = -1 EAGAIN (Resource temporarily unavailable) open("/root/.cache/bazel/_bazel_root/038a9c24c67a3f14ac28680c554d9af8/server/server.pid.txt", O_RDONLY) = 9 read(9, "11702", 32) = 5 read(9, "", 27) = 0 close(9) = 0 readlink("/root/.cache/bazel/_bazel_root/038a9c24c67a3f14ac28680c554d9af8/install", "/root/.cache/bazel/_bazel_root/i"..., 4096) = 71 open("/root/.cache/bazel/_bazel_root/038a9c24c67a3f14ac28680c554d9af8/server/cmdline", O_RDONLY) = 9 read(9, "bazel(root)\0-XX:+HeapDumpOnOutOf"..., 4096) = 948 read(9, "", 4096) = 0 close(9) = 0 unlink("/root/.cache/bazel/_bazel_root/038a9c24c67a3f14ac28680c554d9af8/javalog.properties") = 0 open("/root/.cache/bazel/_bazel_root/038a9c24c67a3f14ac28680c554d9af8/javalog.properties", O_WRONLY|O_CREAT|O_TRUNC, 0755) = 9 write(9, "handlers=java.util.logging.FileH"..., 380) = 380 close(9) = 0 readlink("/proc/11702/cwd", "/root/bazel", 4096) = 11 clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {0, 32832517}) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigaction(SIGINT, {0x420c70, [INT], SA_RESTORER|SA_RESTART, 0x7f7dd8d12250}, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGTERM, {0x420c70, [TERM], SA_RESTORER|SA_RESTART, 0x7f7dd8d12250}, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGPIPE, {0x420c70, [PIPE], SA_RESTORER|SA_RESTART, 0x7f7dd8d12250}, {SIG_DFL, [], 0}, 8) = 0
Thanks, Atul.
On Wed, May 17, 2017 at 4:59 PM, Atul Sowani sowani@gmail.com wrote:
Hi,
I have observed that certain Bazel test cases from the test suite are timing out on CentOS 7 (kernel version 3.10.0-123.el7.x86_64). For example, I tried bazel_coverage_test (using command bazel test //src/test/shell/bazel:bazel_coverage_test) and observed that it just hangs. I tried tracing it using strace (log attached).
This seems to be CentOS specific behavior as I did not observe this on Ubuntu 16.04.
Has anybody observed this? Is this a regression as far as CentOS is concerned?
Thanks, Atul.
On 17/05/17 12:29, Atul Sowani wrote:
Hi,
I have observed that certain Bazel test cases from the test suite are timing out on CentOS 7 (kernel version 3.10.0-123.el7.x86_64). For example, I tried bazel_coverage_test (using command bazel test //src/test/shell/bazel:bazel_coverage_test) and observed that it just hangs. I tried tracing it using strace (log attached).
This seems to be CentOS specific behavior as I did not observe this on Ubuntu 16.04.
Has anybody observed this? Is this a regression as far as CentOS is concerned?
what even is Bazel ? and where is this test running ?
Bazel is a build system used by many current project (TensorFlow one of them): https://github.com/bazelbuild/bazel
On Fri, May 19, 2017 at 6:00 PM, Karanbir Singh mail-lists@karan.org wrote:
On 17/05/17 12:29, Atul Sowani wrote:
Hi,
I have observed that certain Bazel test cases from the test suite are timing out on CentOS 7 (kernel version 3.10.0-123.el7.x86_64). For example, I tried bazel_coverage_test (using command bazel test //src/test/shell/bazel:bazel_coverage_test) and observed that it just hangs. I tried tracing it using strace (log attached).
This seems to be CentOS specific behavior as I did not observe this on Ubuntu 16.04.
Has anybody observed this? Is this a regression as far as CentOS is concerned?
what even is Bazel ? and where is this test running ?
-- Karanbir Singh +44-207-0999389 | http://www.karan.org/ | twitter.com/kbsingh GnuPG Key : http://www.karan.org/publickey.asc _______________________________________________ CentOS-devel mailing list CentOS-devel@centos.org https://lists.centos.org/mailman/listinfo/centos-devel
Atul Sowani kirjoitti 18.5.2017 klo 11.50:
If your kernel version is 3.10.0-123 then you are *way* backlevel. That is the original 7.0 kernel and the rest of your system is likely to also be at the same 2.5 year old level. You should `yum update` and retest on 7.3.
Trevor
I upgraded the kernel version to 4.11.1-1.el7.elrepo.x86_64 #1 SMP Sun May 14 11:54:29 EDT 2017 and performed the tests. However the result remains unchanged - the test hangs. I can provide the complete log file if required.
You really should have run "yum update" to get your system updated, instead of installing a non-CentOS kernel. This makes it more difficult for others to reproduce your results. In addition to the kernel, you may also have glibc and other libraries from the 7.0 era. A simple "yum update" would have fixed that.
My recommendation would be to set up a fresh CentOS 7.3 system, run "yum update" without installing any 3rd party repositories, reboot, and then try to reproduce the issue on that system.
On Fri, May 19, 2017 at 07:16:18PM +0530, Atul Sowani wrote:
Bazel is a build system used by many current project (TensorFlow one of them): https://github.com/bazelbuild/bazel
I discovered that when building tensorflow. https://people.centos.org/~tru/bazel-centos7 quick and dirty
bazel will download the internet into your ~/.cache/bazel and does its mess there.
imho, should be asked in the github/bazel, no idea what the test is doing...
$ wget https://github.com/bazelbuild/bazel/releases/download/0.4.5/bazel-0.4.5-dist... $ unzip -d bazel-0.4.5-dist bazel-0.4.5-dist.zip # DO put a -d XXXXX otherwise you will be sorry... $ cd bazel-0.4.5-dist && bazel test //src/test/shell/bazel:bazel_coverage_test ... .......... WARNING: Sandboxed execution is not supported on your system and thus hermeticity of actions cannot be guaranteed. See http://bazel.build/docs/bazel-user-manual.html#sandboxing for more information. You can turn off this warning via --ignore_unsupported_sandboxing. INFO: Found 1 test target... INFO: From Compiling third_party/ijar/platform_utils.cc: third_party/ijar/platform_utils.cc: In function 'bool devtools_ijar::write_file(const char*, mode_t, const void*, size_t)': third_party/ijar/platform_utils.cc:66:32: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if (write(fd, data, size) != size) { ^ INFO: From Compiling third_party/ijar/ijar.cc: third_party/ijar/ijar.cc: In member function 'virtual bool devtools_ijar::JarStripperProcessor::Accept(const char*, devtools_ijar::u4)': third_party/ijar/ijar.cc:66:23: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if (filename_len >= CLASS_EXTENSION_LENGTH) { ^ TIMEOUT: //src/test/shell/bazel:bazel_coverage_test (see /home/tru/.cache/bazel/_bazel_tru/350854196609f07e0af44ce02f5d2839/execroot/bazel/bazel-out/local-fastbuild/testlogs/src/test/shell/bazel/bazel_coverage_test/test.log). Target //src/test/shell/bazel:bazel_coverage_test up-to-date: bazel-bin/src/test/shell/bazel/bazel_coverage_test INFO: Elapsed time: 634.627s, Critical Path: 582.47s //src/test/shell/bazel:bazel_coverage_test TIMEOUT in 1 out of 2 in 305.0s /home/tru/.cache/bazel/_bazel_tru/350854196609f07e0af44ce02f5d2839/execroot/bazel/bazel-out/local-fastbuild/testlogs/src/test/shell/bazel/bazel_coverage_test/test.log
Executed 1 out of 1 test: 1 fails locally.
Cheers
Tru
Tru, Anssi, thanks for confirming the issue. I will check for that cached bazel files.
BTW, yum update on my original centos installation did just nothing, hence I was forced to use the alternate kernel.
Best regards, Atul.
On May 19, 2017 10:15 PM, "Tru Huynh" tru@centos.org wrote:
On Fri, May 19, 2017 at 07:16:18PM +0530, Atul Sowani wrote:
Bazel is a build system used by many current project (TensorFlow one of them): https://github.com/bazelbuild/bazel
I discovered that when building tensorflow. https://people.centos.org/~tru/bazel-centos7 quick and dirty
bazel will download the internet into your ~/.cache/bazel and does its mess there.
imho, should be asked in the github/bazel, no idea what the test is doing...
$ wget https://github.com/bazelbuild/bazel/releases/download/0.4.5/ bazel-0.4.5-dist.zip $ unzip -d bazel-0.4.5-dist bazel-0.4.5-dist.zip # DO put a -d XXXXX otherwise you will be sorry... $ cd bazel-0.4.5-dist && bazel test //src/test/shell/bazel:bazel_ coverage_test ... .......... WARNING: Sandboxed execution is not supported on your system and thus hermeticity of actions cannot be guaranteed. See http://bazel.build/docs/bazel-user-manual.html#sandboxing for more information. You can turn off this warning via --ignore_unsupported_ sandboxing. INFO: Found 1 test target... INFO: From Compiling third_party/ijar/platform_utils.cc: third_party/ijar/platform_utils.cc: In function 'bool devtools_ijar::write_file(const char*, mode_t, const void*, size_t)': third_party/ijar/platform_utils.cc:66:32: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if (write(fd, data, size) != size) { ^ INFO: From Compiling third_party/ijar/ijar.cc: third_party/ijar/ijar.cc: In member function 'virtual bool devtools_ijar:: JarStripperProcessor::Accept(const char*, devtools_ijar::u4)': third_party/ijar/ijar.cc:66:23: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if (filename_len >= CLASS_EXTENSION_LENGTH) { ^ TIMEOUT: //src/test/shell/bazel:bazel_coverage_test (see /home/tru/.cache/bazel/_bazel_tru/350854196609f07e0af44ce02f5d28 39/execroot/bazel/bazel-out/local-fastbuild/testlogs/src/ test/shell/bazel/bazel_coverage_test/test.log). Target //src/test/shell/bazel:bazel_coverage_test up-to-date: bazel-bin/src/test/shell/bazel/bazel_coverage_test INFO: Elapsed time: 634.627s, Critical Path: 582.47s //src/test/shell/bazel:bazel_coverage_test TIMEOUT in 1 out of 2 in 305.0s /home/tru/.cache/bazel/_bazel_tru/350854196609f07e0af44ce02f5d28 39/execroot/bazel/bazel-out/local-fastbuild/testlogs/src/ test/shell/bazel/bazel_coverage_test/test.log
Executed 1 out of 1 test: 1 fails locally.
Cheers
Tru
-- Tru Huynh http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xBEFA581B
CentOS-devel mailing list CentOS-devel@centos.org https://lists.centos.org/mailman/listinfo/centos-devel