Hi,
I have observed that certain Bazel test cases from the test suite are timing out on CentOS 7 (kernel version 3.10.0-123.el7.x86_64). For example, I tried bazel_coverage_test (using command bazel test //src/test/shell/bazel:bazel_coverage_test) and observed that it just hangs. I tried tracing it using strace (log attached).
This seems to be CentOS specific behavior as I did not observe this on Ubuntu 16.04.
Has anybody observed this? Is this a regression as far as CentOS is concerned?
Thanks, Atul.
On 17/05/17 12:29, Atul Sowani wrote:
Hi,
I have observed that certain Bazel test cases from the test suite are timing out on CentOS 7 (kernel version 3.10.0-123.el7.x86_64). For example, I tried bazel_coverage_test (using command bazel test //src/test/shell/bazel:bazel_coverage_test) and observed that it just hangs. I tried tracing it using strace (log attached).
This seems to be CentOS specific behavior as I did not observe this on Ubuntu 16.04.
Has anybody observed this? Is this a regression as far as CentOS is concerned?
Thanks, Atul.
If your kernel version is 3.10.0-123 then you are *way* backlevel. That is the original 7.0 kernel and the rest of your system is likely to also be at the same 2.5 year old level. You should `yum update` and retest on 7.3.
Trevor
If your kernel version is 3.10.0-123 then you are *way* backlevel. That is the original 7.0 kernel and the rest of your system is likely to also be at the same 2.5 year old level. You should `yum update` and retest on 7.3.
Trevor
I upgraded the kernel version to 4.11.1-1.el7.elrepo.x86_64 #1 SMP Sun May 14 11:54:29 EDT 2017 and performed the tests. However the result remains unchanged - the test hangs. I can provide the complete log file if required.
There is something interesting going on around line 16698 ("[pid 22714] _exit(0)") in the log file excerpt.
[pid 22703] futex(0x7f7dd0000f20, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 22703] sendmsg(8, {msg_name(0)=NULL, msg_iov(3)=[{"PRI * HTTP/2.0\r\n\r\nSM\r\n\r\n", 24}, {"\0\0\22\4\0\0\0\0\0\0\2\0\0\0\0\0\3\0\0\0\0\0\4\0\0\377\377", 27}, {"\0\0\4\10\0\0\0\0\0\0\17\0\1", 13}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 64 [pid 22703] clock_gettime(CLOCK_MONOTONIC, {1155, 662752484}) = 0 [pid 22703] poll([{fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN|POLLOUT}], 3, 999) = 1 ([{fd=8, revents=POLLOUT}]) [pid 22703] clock_gettime(CLOCK_MONOTONIC, {1155, 662807734}) = 0 [pid 22703] poll([{fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}], 3, 999 <unfinished ...> [pid 22714] clock_gettime(CLOCK_MONOTONIC, {1155, 662865341}) = 0 [pid 22714] sendmsg(8, {msg_name(0)=NULL, msg_iov(16)=[{"\0\1\22\1\4\0\0\0\1@\7", 11}, {":scheme", 7}, {"\4", 1}, {"http", 4}, {"@\7", 2}, {":method", 7}, {"\4", 1}, {"POST", 4}, {"@\5", 2}, {":path", 5}, {""", 1}, {"/command_server.CommandServer/Pi"..., 34}, {"@\n", 2}, {":authority", 10}, {"\v[::1]:42219@\r", 14}, {"grpc-encoding", 13}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 118 [pid 22714] sendmsg(8, {msg_name(0)=NULL, msg_iov(16)=[{"\10", 1}, {"identity", 8}, {"@\24", 2}, {"grpc-accept-encoding", 20}, {"\25", 1}, {"identity,deflate,gzip", 21}, {"@\2", 2}, {"te", 2}, {"\10", 1}, {"trailers", 8}, {"@\f", 2}, {"content-type", 12}, {"\20", 1}, {"application/grpc", 16}, {"@\n", 2}, {"user-agent", 10}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 109 [pid 22714] sendmsg(8, {msg_name(0)=NULL, msg_iov(8)=[{"%", 1}, {"grpc-c++/0.13.0 grpc-c/0.13.0 (l"..., 37}, {"@\f", 2}, {"grpc-timeout", 12}, {"\00310S\0\0\4\10\0\0\0\0\1\0\0", 15}, {"\377\377\0\0%\0\1\0\0\0\1\0\0\0\0", 15}, {" ", 1}, {"\n\03646fec02652d766b966d8d8da6f32ae", 32}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 115 [pid 22714] madvise(0x7f7dd6606000, 8368128, MADV_DONTNEED) = 0 [pid 22714] _exit(0) = ? [pid 22703] <... poll resumed> ) = 1 ([{fd=8, revents=POLLIN}]) [pid 22714] +++ exited with 0 +++ recvmsg(8, {msg_name(0)=NULL, msg_iov(1)=[{"\0\0\f\4\0\0\0\0\0\0\3\177\377\377\377\0\4\0\20\0\0\0\0\4\10\0\0\0\0\0\0\17"..., 8192}], msg_controllen=0, msg_flags=0}, 0) = 183 sendmsg(8, {msg_name(0)=NULL, msg_iov(1)=[{"\0\0\0\4\1\0\0\0\0", 9}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 9 recvmsg(8, 0x7ffc26b44450, 0) = -1 EAGAIN (Resource temporarily unavailable) open("/root/.cache/bazel/_bazel_root/038a9c24c67a3f14ac28680c554d9af8/server/server.pid.txt", O_RDONLY) = 9 read(9, "11702", 32) = 5 read(9, "", 27) = 0 close(9) = 0 readlink("/root/.cache/bazel/_bazel_root/038a9c24c67a3f14ac28680c554d9af8/install", "/root/.cache/bazel/_bazel_root/i"..., 4096) = 71 open("/root/.cache/bazel/_bazel_root/038a9c24c67a3f14ac28680c554d9af8/server/cmdline", O_RDONLY) = 9 read(9, "bazel(root)\0-XX:+HeapDumpOnOutOf"..., 4096) = 948 read(9, "", 4096) = 0 close(9) = 0 unlink("/root/.cache/bazel/_bazel_root/038a9c24c67a3f14ac28680c554d9af8/javalog.properties") = 0 open("/root/.cache/bazel/_bazel_root/038a9c24c67a3f14ac28680c554d9af8/javalog.properties", O_WRONLY|O_CREAT|O_TRUNC, 0755) = 9 write(9, "handlers=java.util.logging.FileH"..., 380) = 380 close(9) = 0 readlink("/proc/11702/cwd", "/root/bazel", 4096) = 11 clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {0, 32832517}) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigaction(SIGINT, {0x420c70, [INT], SA_RESTORER|SA_RESTART, 0x7f7dd8d12250}, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGTERM, {0x420c70, [TERM], SA_RESTORER|SA_RESTART, 0x7f7dd8d12250}, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGPIPE, {0x420c70, [PIPE], SA_RESTORER|SA_RESTART, 0x7f7dd8d12250}, {SIG_DFL, [], 0}, 8) = 0
Thanks, Atul.
On Wed, May 17, 2017 at 4:59 PM, Atul Sowani sowani@gmail.com wrote:
Hi,
I have observed that certain Bazel test cases from the test suite are timing out on CentOS 7 (kernel version 3.10.0-123.el7.x86_64). For example, I tried bazel_coverage_test (using command bazel test //src/test/shell/bazel:bazel_coverage_test) and observed that it just hangs. I tried tracing it using strace (log attached).
This seems to be CentOS specific behavior as I did not observe this on Ubuntu 16.04.
Has anybody observed this? Is this a regression as far as CentOS is concerned?
Thanks, Atul.
Atul Sowani kirjoitti 18.5.2017 klo 11.50:
If your kernel version is 3.10.0-123 then you are *way* backlevel. That is the original 7.0 kernel and the rest of your system is likely to also be at the same 2.5 year old level. You should `yum update` and retest on 7.3.
Trevor
I upgraded the kernel version to 4.11.1-1.el7.elrepo.x86_64 #1 SMP Sun May 14 11:54:29 EDT 2017 and performed the tests. However the result remains unchanged - the test hangs. I can provide the complete log file if required.
You really should have run "yum update" to get your system updated, instead of installing a non-CentOS kernel. This makes it more difficult for others to reproduce your results. In addition to the kernel, you may also have glibc and other libraries from the 7.0 era. A simple "yum update" would have fixed that.
My recommendation would be to set up a fresh CentOS 7.3 system, run "yum update" without installing any 3rd party repositories, reboot, and then try to reproduce the issue on that system.
On 17/05/17 12:29, Atul Sowani wrote:
Hi,
I have observed that certain Bazel test cases from the test suite are timing out on CentOS 7 (kernel version 3.10.0-123.el7.x86_64). For example, I tried bazel_coverage_test (using command bazel test //src/test/shell/bazel:bazel_coverage_test) and observed that it just hangs. I tried tracing it using strace (log attached).
This seems to be CentOS specific behavior as I did not observe this on Ubuntu 16.04.
Has anybody observed this? Is this a regression as far as CentOS is concerned?
what even is Bazel ? and where is this test running ?
Bazel is a build system used by many current project (TensorFlow one of them): https://github.com/bazelbuild/bazel
On Fri, May 19, 2017 at 6:00 PM, Karanbir Singh mail-lists@karan.org wrote:
On 17/05/17 12:29, Atul Sowani wrote:
Hi,
I have observed that certain Bazel test cases from the test suite are timing out on CentOS 7 (kernel version 3.10.0-123.el7.x86_64). For example, I tried bazel_coverage_test (using command bazel test //src/test/shell/bazel:bazel_coverage_test) and observed that it just hangs. I tried tracing it using strace (log attached).
This seems to be CentOS specific behavior as I did not observe this on Ubuntu 16.04.
Has anybody observed this? Is this a regression as far as CentOS is concerned?
what even is Bazel ? and where is this test running ?
-- Karanbir Singh +44-207-0999389 | http://www.karan.org/ | twitter.com/kbsingh GnuPG Key : http://www.karan.org/publickey.asc _______________________________________________ CentOS-devel mailing list CentOS-devel@centos.org https://lists.centos.org/mailman/listinfo/centos-devel
On Fri, May 19, 2017 at 07:16:18PM +0530, Atul Sowani wrote:
Bazel is a build system used by many current project (TensorFlow one of them): https://github.com/bazelbuild/bazel
I discovered that when building tensorflow. https://people.centos.org/~tru/bazel-centos7 quick and dirty
bazel will download the internet into your ~/.cache/bazel and does its mess there.
imho, should be asked in the github/bazel, no idea what the test is doing...
$ wget https://github.com/bazelbuild/bazel/releases/download/0.4.5/bazel-0.4.5-dist... $ unzip -d bazel-0.4.5-dist bazel-0.4.5-dist.zip # DO put a -d XXXXX otherwise you will be sorry... $ cd bazel-0.4.5-dist && bazel test //src/test/shell/bazel:bazel_coverage_test ... .......... WARNING: Sandboxed execution is not supported on your system and thus hermeticity of actions cannot be guaranteed. See http://bazel.build/docs/bazel-user-manual.html#sandboxing for more information. You can turn off this warning via --ignore_unsupported_sandboxing. INFO: Found 1 test target... INFO: From Compiling third_party/ijar/platform_utils.cc: third_party/ijar/platform_utils.cc: In function 'bool devtools_ijar::write_file(const char*, mode_t, const void*, size_t)': third_party/ijar/platform_utils.cc:66:32: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if (write(fd, data, size) != size) { ^ INFO: From Compiling third_party/ijar/ijar.cc: third_party/ijar/ijar.cc: In member function 'virtual bool devtools_ijar::JarStripperProcessor::Accept(const char*, devtools_ijar::u4)': third_party/ijar/ijar.cc:66:23: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if (filename_len >= CLASS_EXTENSION_LENGTH) { ^ TIMEOUT: //src/test/shell/bazel:bazel_coverage_test (see /home/tru/.cache/bazel/_bazel_tru/350854196609f07e0af44ce02f5d2839/execroot/bazel/bazel-out/local-fastbuild/testlogs/src/test/shell/bazel/bazel_coverage_test/test.log). Target //src/test/shell/bazel:bazel_coverage_test up-to-date: bazel-bin/src/test/shell/bazel/bazel_coverage_test INFO: Elapsed time: 634.627s, Critical Path: 582.47s //src/test/shell/bazel:bazel_coverage_test TIMEOUT in 1 out of 2 in 305.0s /home/tru/.cache/bazel/_bazel_tru/350854196609f07e0af44ce02f5d2839/execroot/bazel/bazel-out/local-fastbuild/testlogs/src/test/shell/bazel/bazel_coverage_test/test.log
Executed 1 out of 1 test: 1 fails locally.
Cheers
Tru
Tru, Anssi, thanks for confirming the issue. I will check for that cached bazel files.
BTW, yum update on my original centos installation did just nothing, hence I was forced to use the alternate kernel.
Best regards, Atul.
On May 19, 2017 10:15 PM, "Tru Huynh" tru@centos.org wrote:
On Fri, May 19, 2017 at 07:16:18PM +0530, Atul Sowani wrote:
Bazel is a build system used by many current project (TensorFlow one of them): https://github.com/bazelbuild/bazel
I discovered that when building tensorflow. https://people.centos.org/~tru/bazel-centos7 quick and dirty
bazel will download the internet into your ~/.cache/bazel and does its mess there.
imho, should be asked in the github/bazel, no idea what the test is doing...
$ wget https://github.com/bazelbuild/bazel/releases/download/0.4.5/ bazel-0.4.5-dist.zip $ unzip -d bazel-0.4.5-dist bazel-0.4.5-dist.zip # DO put a -d XXXXX otherwise you will be sorry... $ cd bazel-0.4.5-dist && bazel test //src/test/shell/bazel:bazel_ coverage_test ... .......... WARNING: Sandboxed execution is not supported on your system and thus hermeticity of actions cannot be guaranteed. See http://bazel.build/docs/bazel-user-manual.html#sandboxing for more information. You can turn off this warning via --ignore_unsupported_ sandboxing. INFO: Found 1 test target... INFO: From Compiling third_party/ijar/platform_utils.cc: third_party/ijar/platform_utils.cc: In function 'bool devtools_ijar::write_file(const char*, mode_t, const void*, size_t)': third_party/ijar/platform_utils.cc:66:32: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if (write(fd, data, size) != size) { ^ INFO: From Compiling third_party/ijar/ijar.cc: third_party/ijar/ijar.cc: In member function 'virtual bool devtools_ijar:: JarStripperProcessor::Accept(const char*, devtools_ijar::u4)': third_party/ijar/ijar.cc:66:23: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if (filename_len >= CLASS_EXTENSION_LENGTH) { ^ TIMEOUT: //src/test/shell/bazel:bazel_coverage_test (see /home/tru/.cache/bazel/_bazel_tru/350854196609f07e0af44ce02f5d28 39/execroot/bazel/bazel-out/local-fastbuild/testlogs/src/ test/shell/bazel/bazel_coverage_test/test.log). Target //src/test/shell/bazel:bazel_coverage_test up-to-date: bazel-bin/src/test/shell/bazel/bazel_coverage_test INFO: Elapsed time: 634.627s, Critical Path: 582.47s //src/test/shell/bazel:bazel_coverage_test TIMEOUT in 1 out of 2 in 305.0s /home/tru/.cache/bazel/_bazel_tru/350854196609f07e0af44ce02f5d28 39/execroot/bazel/bazel-out/local-fastbuild/testlogs/src/ test/shell/bazel/bazel_coverage_test/test.log
Executed 1 out of 1 test: 1 fails locally.
Cheers
Tru
-- Tru Huynh http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xBEFA581B
CentOS-devel mailing list CentOS-devel@centos.org https://lists.centos.org/mailman/listinfo/centos-devel