Bug 1192945

Summary: s390x: kernel issue on OBS builder
Product: [SUSE Linux Enterprise Server] PUBLIC SUSE Linux Enterprise Server 15 SP3 Reporter: Lubos Kocman <lubos.kocman>
Component: KernelAssignee: Kernel Bugs <kernel-bugs>
Status: RESOLVED DUPLICATE QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P3 - Medium CC: ada.lovelace, bugproxy, geraldsc, gery.schneider, ihno, tiwai, tstaudt
Version: SLES15SP3Maint-Upd   
Target Milestone: unspecified   
Hardware: S/390-64   
OS: Other   
See Also: https://bugzilla.linux.ibm.com/show_bug.cgi?id=195484
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Lubos Kocman 2021-11-22 12:44:12 UTC
Hello team,


this particular issue is affecting our openSUSE Leap 15.4 builds on s390x (builders are running SP3, therefore SP3 product).


We seem to have a very random issue which leads to extremely long build time.
Happened in one out of 5k cases.

last build of Jamulus in Backports openSUSE:Backports:SLE-15-SP4/Jamulus/standard/s390x
1:32
Worker: s390zl22:3 Buildtime: 2 days (57636%)

2021-11-15 06:36:18  Jamulus                                            meta change      succeeded                4m 23s   s390zl24:7
2021-11-15 10:20:10  Jamulus                                            meta change      unchanged                4m  4s   s390zl26:6
2021-11-19 23:11:43  Jamulus                                            meta change      failed                   0m 54s   s390zl28:3
2021-11-22 12:25:39  Jamulus                                            meta change      failed           1d 18h  6m 51s   s390zl22:3


Logs say:

[132899s] [132890.524958][    C1] rcu: 	0-...!: (4070364 ticks this GP) idle=612/1/0x4000000000000002 softirq=17036/17036 fqs=12 
[132899s] [132890.525048][    C1] rcu: rcu_sched kthread starved for 13277955 jiffies! g5793 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
[132899s] [132890.525104][    C1] rcu: 	Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[132899s] [132890.525151][    C1] rcu: RCU grace-period kthread stack dump:
[132899s] [132890.525213][    C1] rcu: Stack dump where RCU GP kthread last ran:
[133079s] [133070.537133][    C0] rcu: INFO: rcu_sched self-detected stall on CPU
[133079s] [133070.821610][    C0] rcu: 	0-...!: (4074271 ticks this GP) idle=612/1/0x4000000000000002 softirq=17036/17036 fqs=12 
[133079s] [133070.821725][    C0] rcu: rcu_sched kthread starved for 13295987 jiffies! g5793 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
[133079s] [133070.821817][    C0] rcu: 	Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[133079s] [133070.832750][    C0] rcu: RCU grace-period kthread stack dump:
[133079s] [133070.832854][    C0] rcu: Stack dump where RCU GP kthread last ran:
[133259s] [133250.774194][    C1] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[133259s] [133250.879319][    C1] rcu: 	0-...!: (4078116 ticks this GP) idle=612/1/0x4000000000000000 softirq=17036/17036 fqs=12 
[133259s] [133250.879501][    C1] rcu: rcu_sched kthread starved for 13313982 jiffies! g5793 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
[133259s] [133250.879560][    C1] rcu: 	Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[133259s] [133250.879608][    C1] rcu: RCU grace-period kthread stack dump:
[133259s] [133250.879672][    C1] rcu: Stack dump where RCU GP kthread last ran:
[133439s] [133431.048936][    C0] rcu: INFO: rcu_sched self-detected stall on CPU
[133439s] [133431.049130][    C0] rcu: 	0-...!: (4081728 ticks this GP) idle=612/1/0x4000000000000002 softirq=17036/17036 fqs=12 
[133439s] [133431.049293][    C0] rcu: rcu_sched kthread starved for 13332010 jiffies! g5793 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
[133439s] [133431.049371][    C0] rcu: 	Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[133439s] [133431.049429][    C0] rcu: RCU grace-period kthread stack dump:
[133439s] [133431.049515][    C0] rcu: Stack dump where RCU GP kthread last ran:
Comment 1 Takashi Iwai 2021-11-22 12:47:15 UTC
Sounds like a dup of bug 1192454
Comment 2 Lubos Kocman 2021-11-23 10:19:07 UTC
Marking as duplicate, symptoms are the same.

*** This bug has been marked as a duplicate of bug 1192454 ***