Bug 1175503

Summary: WARNING: percpu ref (cgroup_bpf_release_fn) <= 0 (0) after switching to atomic
Product: [openSUSE] openSUSE Distribution Reporter: David Disseldorp <ddiss>
Component: KernelAssignee: openSUSE Kernel Bugs <kernel-bugs>
Status: RESOLVED DUPLICATE QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: mt, tiwai
Version: Leap 15.2   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description David Disseldorp 2020-08-19 17:51:27 UTC
I hit this while running a large "btrfs send|ssh" job...

[58349.450953] ------------[ cut here ]------------
[58349.450958] percpu ref (cgroup_bpf_release_fn) <= 0 (0) after switching to atomic
[58349.450968] WARNING: CPU: 2 PID: 0 at ../lib/percpu-refcount.c:162 percpu_ref_switch_to_atomic_rcu+0xfd/0x120
[58349.450969] Modules linked in: vhost_net vhost tap xt_CHECKSUM xt_MASQUERADE tun bridge stp llc xfs zram snd_usb_audio snd_usbmidi_lib snd_rawmidi hid_plantronics snd_seqo
[58349.450998]  intel_rapl_msr dell_smm_hwmon snd_hda_codec_hdmi kvm iwlwifi snd_hda_intel irqbypass snd_hda_codec dell_wmi snd_hda_core dell_smbios dcdbas joydev sparse_keys
[58349.451024] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.3.18-lp152.33-default #1 openSUSE Leap 15.2 (unreleased)
[58349.451025] Hardware name: Dell Inc. Latitude E7450/0YGN55, BIOS A21 05/16/2019
[58349.451026] RIP: 0010:percpu_ref_switch_to_atomic_rcu+0xfd/0x120
[58349.451027] Code: 00 e9 71 ff ff ff 80 3d 0c 38 06 01 00 75 83 48 8b 55 d8 48 8b 75 e8 48 c7 c7 a8 2a 53 87 c6 05 f4 37 06 01 01 e8 73 0d bc ff <0f> 0b e9 61 ff ff ff f0 f
[58349.451028] RSP: 0018:ffffaf2500134ed8 EFLAGS: 00010282
[58349.451029] RAX: 0000000000000000 RBX: 7fffffffffffffff RCX: 0000000000000000
[58349.451030] RDX: 0000000000000045 RSI: ffffffff87f914c5 RDI: 0000000000000246
[58349.451030] RBP: ffff9b7aa404e938 R08: ffffffff87f91480 R09: 000000000002c580
[58349.451031] R10: 0000000000000000 R11: 0000000080000002 R12: 000033a761a0d1b0
[58349.451031] R13: ffffffff879692c0 R14: 000000000000000a R15: ffffffff87869808
[58349.451032] FS:  0000000000000000(0000) GS:ffff9b7d9e300000(0000) knlGS:0000000000000000
[58349.451033] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[58349.451033] CR2: 000055932cfa81e0 CR3: 0000000193dbe002 CR4: 00000000003606e0
[58349.451034] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[58349.451034] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[58349.451034] Call Trace:
[58349.451036]  <IRQ>
[58349.451039]  ? percpu_ref_exit+0x30/0x30
[58349.451041]  rcu_core+0x1b5/0x730
[58349.451044]  __do_softirq+0xe3/0x2dc
[58349.451046]  irq_exit+0xa6/0xb0
[58349.451048]  smp_apic_timer_interrupt+0x74/0x130
[58349.451049]  apic_timer_interrupt+0xf/0x20
[58349.451050]  </IRQ>
[58349.451052] RIP: 0010:cpuidle_enter_state+0xa7/0x430
[58349.451053] Code: 7f 4d 49 79 e8 ea 57 95 ff 49 89 c5 0f 1f 44 00 00 31 ff e8 2b 68 95 ff 80 7c 24 0b 00 0f 85 e2 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 e4 0f 89 fb 01 00 9
[58349.451053] RSP: 0018:ffffaf25000bbe80 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[58349.451055] RAX: ffff9b7d9e32cd00 RBX: ffffffff878e1960 RCX: 000000000000001f
[58349.451055] RDX: 000035118a4bdb8b RSI: 00000000315841e7 RDI: 0000000000000000
[58349.451056] RBP: ffff9b7d9e337300 R08: 0000000000000002 R09: 000000000002c580
[58349.451057] R10: ffffaf25000bbe60 R11: 00000000000001c1 R12: 0000000000000002
[58349.451057] R13: 000035118a4bdb8b R14: 0000000000000002 R15: 0000000000000000
[58349.451060]  cpuidle_enter+0x29/0x40
[58349.451062]  do_idle+0x1f2/0x270
[58349.451064]  cpu_startup_entry+0x19/0x20
[58349.451066]  start_secondary+0x16a/0x1c0
[58349.451068]  secondary_startup_64+0xb6/0xc0
[58349.451069] ---[ end trace 9aa9bafec2b60f0f ]---

Looks like it could be:
https://bugzilla.redhat.com/show_bug.cgi?id=1843546

which points to the following potential fix:
ad0f75e5f57ccbceec13274e1e242f2b5a6397ed

I didn't have any BPF tools loaded at the time of the warning.
Comment 1 Takashi Iwai 2020-08-19 17:57:36 UTC
The suggested fix should have been already included in the latest openSUSE-15.2 branch.

The was already a release of SLE15-SP2 update for this (bsc#1175213), but Leap 15.2 seems forgotten or stalling.  Please check with openSUSE-15.2 KOTD.
Comment 2 David Disseldorp 2020-08-19 21:01:20 UTC
(In reply to Takashi Iwai from comment #1)
> The suggested fix should have been already included in the latest
> openSUSE-15.2 branch.
> 
> The was already a release of SLE15-SP2 update for this (bsc#1175213), but
> Leap 15.2 seems forgotten or stalling.  Please check with openSUSE-15.2 KOTD.

Thanks, I'll check when I run the next backup job in a month or so. In the meantime, feel free to close this or dup it against bsc#1175213 - I'll reopen if I hit it again.
Comment 3 Takashi Iwai 2020-08-20 07:24:17 UTC
OK, let's close this for now.

*** This bug has been marked as a duplicate of bug 1175213 ***
Comment 4 Michal Koutný 2020-09-03 07:26:42 UTC
*** Bug 1176066 has been marked as a duplicate of this bug. ***