Bug 211859

Summary: oom_adj -17 does not prevent OOM killing a task
Product: [openSUSE] openSUSE 10.2 Reporter: Olaf Hering <ohering>
Component: KernelAssignee: Nick Piggin <npiggin>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None    
Version: Alpha 5   
Target Milestone: ---   
Hardware: PowerPC   
OS: Linux   
Whiteboard: released:kernel:sles10,sles10sp1,10.2
Found By: Development Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: dmesg-2.6.18-11-default.txt
fix for SLES10SP1

Description Olaf Hering 2006-10-12 09:27:48 UTC
echo -n -17 > /proc/2969/oom_adj
this does not prevent the kernel from killing pid 2969 anyway.

Out of Memory: Kill process 2401 (YaST2.call) score 1789 and children.
Out of memory: Killed process 2969 (Xorg).

setting /proc/sys/vm/overcommit_memory from 1 to 0 does not help either.
Comment 1 Olaf Hering 2006-10-12 09:29:14 UTC
Created attachment 101290 [details]
dmesg-2.6.18-11-default.txt

there is no swap at that time, the dmesg is just from yast probing the available swap partitions.
Comment 2 Nick Piggin 2006-10-12 09:40:27 UTC
We have two problems.

When oom killing a task and its children, the children are killed without
regard for oom_adj == -17. That is the bug being hit here.

Another problem, is that in architecture pagefault handlers, any task which
encounters an -ENOMEM in the fault path will be killed immediately. What
should happen is the oom being passed to mm/oom_kill.c
Comment 3 Nick Piggin 2007-01-23 00:37:36 UTC
Created attachment 114296 [details]
fix for SLES10SP1
Comment 4 Nick Piggin 2007-01-23 00:40:36 UTC
OK, the above patch for SLES10SP1 fixes the issue of children being killed, and is also upstream.

The pagefault issue is much rarer, and probably only theoretical on most
architectures. I have not included a fix for that because I haven't got anything
upstream yet, and the fix is a bit more involved.

Not sure about SL10 kernels, I'll just clarify what branches need to be patched.
Comment 5 Nick Piggin 2007-01-24 04:33:55 UTC
Applied to SL102, SLES10_GA and SLES10_SP1.

Thanks.
Comment 6 Klaus Wagner 2007-02-06 12:49:33 UTC
Corrected Syntax of Whiteboard entry to Maint. standard syntax for better 
tracking: s/kernels/kernel:s/
Comment 7 Klaus Wagner 2007-03-13 12:52:12 UTC
Just for the record:

Patch: patches.fixes/oom-child-kill-fix.patch
published in SLE10 kernelupdate 2.6.16.27-0.9,
dated Feb 13, 2007 & released Feb 23, 2007.

Setting Whiteboard Status for SLE10 --> released
Comment 8 Klaus Wagner 2008-01-23 10:54:03 UTC
Again, for the record:
 
Patch: patches.fixes/oom-child-kill-fix.patch
 
also included, enabled and released in
 -  SLE10 SP1 GA kernel 2.6.16.46-0.12 (May 2007)
 -  openSUSE 10.2 update kernel 2.6.18.8-0.3 (Apr 2007)
 
Setting Whiteboard Statuses for SLE10SP1 and 10.2 --> released