Hi *
I recently tried to compile a custom centos-stream-9 automotive SIG kernel and failed.
Now I see the very same problem is present in the mainline centos-stream-9 kernel.
When I clone the sources from https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9 on a freshly installed centos-stream-9 host, the make dist-all-rpms target fails with
make[3]: rsync: Argument list too long make[3]: *** [Makefile:768: test_progs-no_alu32-extras] Error 127
This is certainly not a problem with rsync and the ulimits are definitely sufficient to deal with this ~2000 char argument list.
So what is going on here? This rsync call appears to come from the tools/testing/selftests/bpf/Makefile. The problem persists if I BUILDOPTS="-selftests" and I understand bpf to be a SKIP_TARGET by default, anyway.
Does anyone have a hint? What is missing or what am I missing?
Beste Grüße Sebastian Hetze
On Sat, Mar 1, 2025 at 6:49 AM Sebastian Hetze shetze@redhat.com wrote:
Error 127 means command not found.
Did you run "dnf builddep kernel" to install all build dependencies?
-- 真実はいつも一つ!/ Always, there's only one truth!
Hi Neal,
thank you for sharing your thoughts.
The dependencies listed in the kernel.spec are resolved, so this is not the cause of the problem.
Here is how to reproduce:
1. build a centos-stream-9 host as a VM 1. the VM is running on a Fedora host, I think this should not matter 2. in order to shorten the turnaround time, the VM has some substantial resources: 12 cores 32 GB of RAM. This may be important. 2. prepare for development tasks, as described in Building an AutoSD image that includes a custom kernel https://sigs.centos.org/automotive/building/building_an_os_image_that%20uses_a_custom_kernel/ : $ sudo dnf update $ sudo dnf groupinstall "Development Tools" $ sudo dnf install ncurses-devel bison flex elfutils-libelf-devel openssl-devel dwarves 3. Clone the centos-stream-9 kernel sources: $ git clone https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9.git 4. Install the build deps (again as described for the AutoSD image build): $ cd centos-stream-9 $ make -j$(nproc) dist-srpm $ sudo dnf config-manager --set-enabled crb $ sudo dnf builddep redhat/rpm/SPECS/kernel.spec 5. Build the kernel packages: $ make -j$(nproc) DISTLOCALVERSION=_at BUILDOPTS="-selftests +verbose" dist-all-rpms 2>&1 | tee -a output01
The failing rsync command is rsync -aq /home/she/Workspace/centos-stream-9/redhat/rpm/BUILD/kernel-5.14.0-570_at.el9/linux-5.14.0-570_at.el9.x86_64/tools/testing/selftests/bpf/urandom_read /home/she/Workspace/centos-stream-9/redhat/rpm/BUILD/kernel-5.14.0-570_at.el9/linux-5.14.0-570_at.el9.x86_64/tools/testing/selftests/bpf/bpf_testmod.ko /home/she/Workspace/centos-stream-9/redhat/rpm/BUILD/kernel-5.14.0-570_at.el9/linux-5.14.0-570_at.el9.x86_64/tools/testing/selftests/bpf/bpf_test_no_cfi.ko /home/she/Workspace/centos-stream-9/redhat/rpm/BUILD/kernel-5.14.0-570_at.el9/linux-5.14.0-570_at.el9.x86_64/tools/testing/selftests/bpf/bpf_test_modorder_x.ko /home/she/Workspace/centos-stream-9/redhat/rpm/BUILD/kernel-5.14.0-570_at.el9/linux-5.14.0-570_at.el9.x86_64/tools/testing/selftests/bpf/bpf_test_modorder_y.ko /home/she/Workspace/centos-stream-9/redhat/rpm/BUILD/kernel-5.14.0-570_at.el9/linux-5.14.0-570_at.el9.x86_64/tools/testing/selftests/bpf/liburandom_read.so /home/she/Workspace/centos-stream-9/redhat/rpm/BUILD/kernel-5.14.0-570_at.el9/linux-5.14.0-570_at.el9.x86_64/tools/testing/selftests/bpf/xdp_synproxy /home/she/Workspace/centos-stream-9/redhat/rpm/BUILD/kernel-5.14.0-570_at.el9/linux-5.14.0-570_at.el9.x86_64/tools/testing/selftests/bpf/sign-file /home/she/Workspace/centos-stream-9/redhat/rpm/BUILD/kernel-5.14.0-570_at.el9/linux-5.14.0-570_at.el9.x86_64/tools/testing/selftests/bpf/uprobe_multi ima_setup.sh verify_sig_setup.sh progs/btf_dump_test_case_bitfields.c progs/btf_dump_test_case_multidim.c progs/btf_dump_test_case_namespacing.c progs/btf_dump_test_case_ordering.c progs/btf_dump_test_case_packing.c progs/btf_dump_test_case_padding.c progs/btf_dump_test_case_syntax.c /home/she/Workspace/centos-stream-9/redhat/rpm/BUILD/kernel-5.14.0-570_at.el9/linux-5.14.0-570_at.el9.x86_64/tools/testing/selftests/bpf/no_alu32/
This should explain the file not found error. And obviously the actual length of the argument list is way below our limits. So the error messages are misleading. The problem must be somewhere else.
To make things a bit more confusing: After quite some experimenting, including completely new installations of the build host, I occasionally get a successful build. But this appears to happen at random. Sequences of "make dist-clean; make dist-all-rpms" sometimes succeed and sometimes fail. There are no dmesg warnings or other indications of a problem with the running VM.
Is it possible, that the -j$(nproc) causes the problem? I do see "-j12 forced in submake: resetting jobserver mode." warnings from make.
Beste Grüße Sebastian Hetze
On Sun, Mar 2, 2025 at 9:53 AM Sebastian Hetze shetze@redhat.com wrote:
To make things a bit more confusing: After quite some experimenting, including completely new installations of the build host, I occasionally get a successful build. But this appears to happen at random. Sequences of "make dist-clean; make dist-all-rpms" sometimes succeed and sometimes fail. There are no dmesg warnings or other indications of a problem with the running VM.
Is it possible, that the -j$(nproc) causes the problem? I do see "-j12 forced in submake: resetting jobserver mode." warnings from make.
I personally use -j6 because anything higher is unpredictable on my machine. But if that doesn't work, lowering to -j4 should be better.
I have tried -j4 and reduced the number of cores for the VM down to 4 and other than increasing the compile time I see no difference. The compile still fails with the rsync: Argument list too long error.
Comparing the verbose output for successful and failing builds, I see warnings in the failing builds:
Warning: Kernel ABI header at 'tools/include/uapi/linux/bpf.h' differs from latest version at 'include/uapi/linux/bpf.h' Warning: Kernel ABI header at 'tools/include/uapi/linux/if_xdp.h' differs from latest version at 'include/uapi/linux/if_xdp.h' Warning: missing liburing support. Some tests will be skipped. Warning: you seem to have a broken 32-bit build
I also see the use of clang in the failing builds, while the successful ones don't have it.
Does this give any indication of what is going wrong?
Beste Grüße Sebastian Hetze
Hi *, I need help. Can anyone confirm that this problem is reproducible? Or is this really a unique condition with my local build environment? I am willing to spend time to further investigate, but without any hint what direction I should look this is becoming frustrating. I cannot believe the local build target for the centos-stream9 kernel is simply not supposed to work. Beste Grüße Sebastian Hetze
On Tue, Mar 4, 2025 at 2:02 PM Sebastian Hetze shetze@redhat.com wrote:
Can you reproduce this issue with the main branch for the centos stream 9 kernel? I used to build cs9 kernels regularly before I switched to ark for hyperscale, and I did have to make a number of fixes to make it work properly. Maybe it regressed? I'd be curious if it's fine with the regular kernel sources as opposed to the automotive ones.
The issue is with the *main* branch for the centos stream 9 kernel and I can reproduce it more than 80% of the time. Only on rare occasions does the build succeed. As said, the failing builds come with a couple of warnings that I do not see on the succeeding ones.
Beste Grüße Sebastian Hetze
Trying to compile older tagged versions, I find the kernel-5.14.0-556.el9 compiles successfully while kernel-5.14.0-557.el9 fails.
Beste Grüße Sebastian Hetze