|
Bugzilla – Full Text Bug Listing |
| Summary: | Kernel 5.12 built with gcc11 pagefaults at boot | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE Tumbleweed | Reporter: | Dominique Leuenberger <dimstar> |
| Component: | Kernel | Assignee: | openSUSE Kernel Bugs <kernel-bugs> |
| Status: | RESOLVED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Normal | ||
| Priority: | P5 - None | CC: | fvogt, jslaby, martin.liska, rguenther |
| Version: | Current | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Bug Depends on: | |||
| Bug Blocks: | 1181859 | ||
|
Description
Dominique Leuenberger
2021-05-10 09:46:45 UTC
I see ffffffff811b6ee0 <trace_event_raw_init>: ffffffff811b6ee0: 48 83 c7 20 add $0x20,%rdi ffffffff811b6ee4: e8 97 57 ff ff callq ffffffff811ac680 <registe r_trace_event> ffffffff811b6ee9: 83 f8 01 cmp $0x1,%eax ffffffff811b6eec: 19 c0 sbb %eax,%eax ffffffff811b6eee: 83 e0 ed and $0xffffffed,%eax ffffffff811b6ef1: c3 retq ffffffff811b6ef2: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1) ffffffff811b6ef9: 00 00 00 00 ffffffff811b6efd: 0f 1f 00 nopl (%rax) ffffffff811ac680 <register_trace_event>: ffffffff811ac680: 41 54 push %r12 ffffffff811ac682: 53 push %rbx ... ffffffff811ac740: 48 89 71 08 mov %rsi,0x8(%rcx) ffffffff811ac744: 48 89 4b 10 mov %rcx,0x10(%rbx) ffffffff811ac748: 48 89 43 18 mov %rax,0x18(%rbx) ffffffff811ac74c: 48 89 30 mov %rsi,(%rax) <--- ffffffff811ac74f: 48 8b 53 28 mov 0x28(%rbx),%rdx ffffffff811ac753: 48 83 3a 00 cmpq $0x0,(%rdx) ffffffff811ac757: 0f 84 8a 00 00 00 je ffffffff811ac7e7 <register_trace_event+0x167> just looking at vmlinux of kernel-default built in :N So I can reproduce the kernel crash locally with a random small package build against Staging:N like devel:gcc/gcc> osc build --alternative-project=openSUSE:Factory:Staging:N --vm-type=kvm -x -vim standard x86_64 gcc.spec that might provide a way to bisect either the kernel (the previous factory one worked) or GCC (where I assume GCC 10 as in Factory works). That said, debugging would also be nice. I'm going to bump GCC 11 in devel:gcc and see if that helps (I doubt). Hmm, I have no experience doing kernel debugging but attaching to the gdbserver from the qemu-kvm process after it's stuck after the crash doesn't provide anything useful. I can just "literally" (well, add quotes around the -append argument) paste the osc build log qemu-kvm invocation. That picks up the kernel from the osc setup osc-build-root/.mount/boot/kernel which is a standard bzImage and thus this way you can side-step building kernel-default and kernel-obs-build just copying over a kernel bzimage to that place and re-booting the KVM guest with qemu-kvm. Jiri promised he'll take a look tomorrow. Quite interestingly, I can reproduce with kernel-obs-build from Staging:N: $ qemu-kvm -kernel .build.kernel.kvm -initrd .build.initrd.kvm -hda /dev/null -m 2G -append 'console=ttyS0' -serial stdio > ... > [ 2.599705] BUG: unable to handle page fault for address: ffffffffc05ded30 > [ 2.601018] #PF: supervisor write access in kernel mode > [ 2.601784] #PF: error_code(0x0003) - permissions violation > [ 2.602632] PGD 25a15067 P4D 25a15067 PUD 25a17067 PMD 480b067 PTE 8000000005504061 > [ 2.603855] Oops: 0003 [#1] SMP PTI > [ 2.604387] CPU: 0 PID: 210 Comm: modprobe Not tainted 5.12.0-6-default #1 openSUSE Tumbleweed > [ 2.605788] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014 > [ 2.607707] RIP: 0010:register_trace_event+0xcc/0x210 > ... > [ 2.626808] Call Trace: > [ 2.627388] trace_event_raw_init+0x9/0x20 > [ 2.628055] event_init+0x2d/0x70 > [ 2.628716] trace_module_notify+0x108/0x1f0 > [ 2.629637] blocking_notifier_call_chain_robust+0x64/0xd0 > [ 2.631256] ? mutex_lock+0xe/0x30 > [ 2.631895] load_module+0x20e4/0x2910 > ... > [ 2.651413] Modules linked in: xfs(+) btrfs blake2b_generic xor raid6_pq libcrc32c reiserfs squashfs fuse dm_snapshot dm_bufio essiv authenc dm_crypt trusted dm_mod binfmt_misc loop sg Note it dies during load of xfs module. But I cannot repro with a kernel built outside OBS: factory:~ # dmesg|head -1 > [ 0.000000] Linux version 5.12.2-default (jslaby@kunlun) (gcc-11 (SUSE Linux) 11.1.1 20210429 [revision cd0a059bd384da58d43674496a79ecb7de610800], GNU ld (GNU Binutils; devel:gcc / openSUSE_Leap_15.1) 2.36.0.20210204-lp151.403) #8 SMP Tue May 11 08:07:05 CEST 2021 factory:~ # lsmod|grep xfs > xfs 1753088 0 > libcrc32c 16384 1 xfs Investigating. Hmm, building in OBS or in Tumbleweed VM (with gcc-11 from TW installed), I see a _lot_ of:
osc rbl openSUSE:Factory:Staging:N kernel-default standard x86_64 |grep section
> ld: warning: orphan section `.data.event_initcall_finish' from `init/main.o' being placed in section `.data.event_initcall_finish'
> Warning: setting incorrect section attributes for .note.Linux
and the result is the above crash. So the build environment matters too (15-SP1 and bintutils+gcc-11 from devel:gcc is not enough).
Note TW's gcc-11 version is 11.0.0 20210208 (experimental) [revision efcd941e86b507d77e90a1b13f621e036eacdb45]
Some notable difference with GCC 11 is support for __attribute__((retain)) (as opposed to 'used') which will result in sections with SHF_RETAIN flags. Does the kernel now make use of that? (and did anybody test that?) Having kernel sources, using this build script:
#!/bin/bash
set -xxe
JOBS=`grep -c ^processor.*: /proc/cpuinfo`
make CC=gcc-11 -j$JOBS bzImage modules
rm -rf ddd
make CC=gcc-11 -j$JOBS modules_install INSTALL_MOD_PATH="$PWD/ddd"
MODS=`ls -d "$PWD"/ddd/lib/modules/*-default`
VER="${MODS##*/}"
# from kernel-obs-build
KERNEL_MODULES="loop dm-crypt essiv dm-mod dm-snapshot binfmt-misc fuse kqemu squashfs ext2 ext3 ext4 reiserfs btrfs xfs nf_conntrack_ipv6 binfmt_misc virtio_pci virtio_mmio virtio_blk virtio_rng fat vfat nls_cp437 nls_iso8859-1 ibmvscsi sd_mod e1000 ibmveth overlay"
dracut --force -k "$MODS" --add-drivers "$KERNEL_MODULES" \
initrd "$VER"
------------------------- END
Then:
$ qemu-kvm -hda /dev/null -kernel /tmp/bzImage -initrd /tmp/initrd -append 'console=ttyS0 rd.driver.pre=xhci_pci root=/dev/sda' -serial stdio -m 2G
yields a different crash, but still related to section mismatch above. To obtain the xfs-module-load crash, one needs a working qemu img.
The only gcc showing the problem is the one in TW: gcc11-11.0.0+git183291-1.5.x86_64. Neither gcc11-11.1.1+git67, nor gcc11-11.1.1+git121 exhibit the problem. So the situation in Staging:N must have been caused by gcc meta pkg set to 11, without submitted or built gcc11. |