Bug 672055

Summary: ulimit package prevents mapping big files
Product: [openSUSE] openSUSE 12.2 Reporter: Jan Engelhardt <jengelh>
Component: BasesystemAssignee: Kurt Garloff <suse>
Status: VERIFIED WONTFIX QA Contact: E-mail List <qa-bugs>
Severity: Minor    
Priority: P5 - None CC: jeremy.figgins, jslaby, mgorman
Version: Factory   
Target Milestone: ---   
Hardware: x86-64   
OS: Linux   
Whiteboard:
Found By: Beta-Customer Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Jan Engelhardt 2011-02-15 11:44:57 UTC
The following program will fail to obtain a vma on a system with less than 2 GB in VRAM (physram + swap):

---
#include <sys/mman.h>
#include <stdio.h>

int main(void)
{
  void *p = mmap(NULL, 2000000000, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, 0, 0);
  printf("%p\n", p);
}
---

Linux normally does overcommit, does it not?
/proc/sys/vm # grep '' overc*
overcommit_memory:0
overcommit_ratio:50

The mmap call also fails if one simply tries to map, for example, a large enough DVD image file in read-only mode — a case where no memory allocation should occur at all.
Comment 1 Jiri Slaby 2011-02-16 13:44:55 UTC
Does it happen with overcommit_memory == 1? 0 means a heuristic which might refuse to overcommit.

I don't think the former is a bug anyway. Why do you think it is?

The map of DVD may a bug though.
Comment 2 Jan Engelhardt 2011-02-16 14:11:10 UTC
>Does it happen with overcommit_memory == 1?

It also happens with 1 or 2 and overcommit_ratio=100.
Comment 3 Mel Gorman 2011-02-16 14:22:19 UTC
(In reply to comment #1)
> Does it happen with overcommit_memory == 1? 0 means a heuristic which might
> refuse to overcommit.
> 

Agreed. mmap() would be expected to fail if
/proc/sys/vm/overcommit_memory was 0 (the default). From the documentation
"When this flag is 0, the kernel attempts to estimate the amount of free memory
left when userspace requests more memory."

> I don't think the former is a bug anyway. Why do you think it is?
> 
> The map of DVD may a bug though.

Maybe.

While this is x86-64, was the test program a 64-bit binary? If it was 32-bit, a
mmap() of a large file would still be expected to fail due to a lack of virtual
addressing space and have nothing to do with available memory. On a similar
vein, was O_LARGEFILE specified to open() and _LARGEFILE64_SOURCE defined?
Comment 4 Jan Engelhardt 2011-02-16 14:32:33 UTC
All components were already x86_64. As such, O_LARGEFILE shouldn't be needed, should it?
Comment 5 Mel Gorman 2011-02-16 14:45:19 UTC
(In reply to comment #4)
> All components were already x86_64. As such, O_LARGEFILE shouldn't be needed,
> should it?

If all components are indeed x86-64 then it shouldn't be needed.
Comment 6 Mel Gorman 2011-02-16 16:57:45 UTC
(In reply to comment #2)
> >Does it happen with overcommit_memory == 1?
> 
> It also happens with 1 or 2 and overcommit_ratio=100.

What version of openSUSE exactly are you testing? The behavior I see with your test program on a machine with 1G of RAM, no swap, and openSUSE factory downloaded yesterday is as follows;

hydra:~ # uname -a
Linux hydra 2.6.37-20-desktop #1 SMP PREEMPT 2011-01-22 00:41:44 +0100 x86_64 x86_64 x86_64 GNU/Linux
hydra:~ # cat /etc/SuSE-release
openSUSE 11.4 RC 1 (x86_64)
VERSION = 11.4
CODENAME = Celadon
hydra:~ # echo 0 > /proc/sys/vm/overcommit_memory 
hydra:~ # ./testcase-vma-map-anon 
0xffffffffffffffff
hydra:~ # echo 1 > /proc/sys/vm/overcommit_memory 
hydra:~ # ./testcase-vma-map-anon 
0x7fa158fa1000
hydra:~ # echo 2 > /proc/sys/vm/overcommit_memory 
hydra:~ # ./testcase-vma-map-anon 
0xffffffffffffffff

This is as expected - the value of 1 overcommitted unconditionally and the other options preventing the mapping due to a lack of physical memory. I tried as a normal user but didn't see any problems.
Comment 7 Jan Engelhardt 2011-02-16 17:51:02 UTC
jengelh@jng-0:/dev/shm> ./m
0xffffffffffffffff
jengelh@jng-0:/dev/shm> free
             total       used       free     shared    buffers     cached
Mem:       2050944     107560    1943384          0      12364      60004
-/+ buffers/cache:      35192    2015752
Swap:            0          0          0
jengelh@jng-0:/dev/shm> cat m.c
#include <sys/mman.h>
#include <stdio.h>

int main(void)
{
  void *p = mmap(NULL, 2000000000, PROT_READ | PROT_WRITE, MAP_SHARED |
MAP_ANONYMOUS, 0, 0);
  printf("%p\n", p);
}
jengelh@jng-0:/dev/shm> su
Password: 
jng-0:/dev/shm # echo 1 >/proc/sys/vm/overcommit_memory 
jng-0:/dev/shm # exit
jengelh@jng-0:/dev/shm> ./m
0xffffffffffffffff
jengelh@jng-0:/dev/shm> cat /etc/SuSE-release
openSUSE 11.4 Milestone 6 of 6 (x86_64)
VERSION = 11.4
CODENAME = Celadon
jengelh@jng-0:/dev/shm> uname -a
Linux jng-0 2.6.38-rc1+ #25 SMP PREEMPT Wed Feb 16 18:01:16 CET 2011 x86_64 x86_64 x86_64 GNU/Linux
Comment 8 Jiri Slaby 2011-02-16 22:47:52 UTC
(In reply to comment #7)
> jengelh@jng-0:/dev/shm> cat /etc/SuSE-release
> openSUSE 11.4 Milestone 6 of 6 (x86_64)
> VERSION = 11.4
> CODENAME = Celadon
> jengelh@jng-0:/dev/shm> uname -a
> Linux jng-0 2.6.38-rc1+ #25 SMP PREEMPT Wed Feb 16 18:01:16 CET 2011 x86_64
> x86_64 x86_64 GNU/Linux

Maybe there was a bug in .38-rc1? Could you retry with the current kotd (rc4)? I cannot reproduce either. (Also RC1 is out as of today. It should have no effect, but who knows if there is any change in glibc. So first, fully upgrade and retry.)
Comment 9 Jan Engelhardt 2011-02-17 01:40:42 UTC
It seems this is not dependent on the kernel version. I have two hosts (one live machine, and a VM on it) both running the same kernel (based upon commit cb76856db0aa2385d9a9e8ce595467bd3ac30fce from kernel-source git).

Trying to map a 25GB ro mapping on /dev/zero fails in the host-side 11.3 environment, while it does work with the VM-side 11.4 RC1. I seem to remember that 11.4 M6 that I had this morning in the VM (*without* systemd) also rejected the big mapping.
Comment 10 Mel Gorman 2011-02-17 09:30:31 UTC
(In reply to comment #9)
> It seems this is not dependent on the kernel version. I have two hosts (one
> live machine, and a VM on it) both running the same kernel (based upon commit
> cb76856db0aa2385d9a9e8ce595467bd3ac30fce from kernel-source git).
> 

So is this failure intermittent or if it fails once on a machine, does it consistently fail? If it's intermittent, it's worth dumping out the contents of /proc/self/maps at the time of failure to rule out virtual address space fragmentation as the source of the mapping failure.

> Trying to map a 25GB ro mapping on /dev/zero fails in the host-side 11.3
> environment, while it does work with the VM-side 11.4 RC1. I seem to remember
> that 11.4 M6 that I had this morning in the VM (*without* systemd) also
> rejected the big mapping.

If it fails on some machines and not on others, can you confirm that the ulimits for all machines and the running user are the same and that /proc/sys/vm/overcommit_memory is 1 in all cases please?
Comment 11 Jan Engelhardt 2011-02-17 13:32:25 UTC
*** Bug 640341 has been marked as a duplicate of this bug. ***
Comment 12 Jan Engelhardt 2011-02-17 13:38:08 UTC
*** Bug 645203 has been marked as a duplicate of this bug. ***
Comment 13 Jan Engelhardt 2011-02-17 13:40:06 UTC
ulimit is the keyword.

--- -   2011-02-17 13:37:12.651260449 +0100
+++ VM  2011-02-17 13:37:05.805295838 +0100
-max memory size         (kbytes, -m) 21025344
+max memory size         (kbytes, -m) unlimited
-virtual memory          (kbytes, -v) 23142880
+virtual memory          (kbytes, -v) unlimited

/etc/sysconfig/ulimit is the same for both hosts. Is it perhaps that systemd does not execute /etc/initscripts anymore?
Comment 14 Kay Sievers 2011-02-17 14:28:39 UTC
(In reply to comment #13)
> ulimit is the keyword.
> 
> --- -   2011-02-17 13:37:12.651260449 +0100
> +++ VM  2011-02-17 13:37:05.805295838 +0100
> -max memory size         (kbytes, -m) 21025344
> +max memory size         (kbytes, -m) unlimited
> -virtual memory          (kbytes, -v) 23142880
> +virtual memory          (kbytes, -v) unlimited
> 
> /etc/sysconfig/ulimit is the same for both hosts. Is it perhaps that systemd
> does not execute /etc/initscripts anymore?

Right, it doesn't. None of these files are read or executed by systemd. These values can be set only in individual service files.

systemd will not use /etc/sysconfig/ or always start a shell script for general service startup. We might need some new global config here.

I'll check with upstream ...
Comment 15 Jan Engelhardt 2011-02-17 14:44:37 UTC
So for me, the solution is to just remove the ulimit package on my installs.

Still, since the ulimit package is installed by default (is it actually?), it does effectively prevent mmaping big files, causing weird observations like these.
Comment 18 Kun Kun Zhang 2012-03-08 03:24:17 UTC
Long time no response.So closed.Feel free to reopen it.Thanks.
Comment 19 Jan Engelhardt 2012-03-15 16:22:11 UTC
Revert bogus close. See c#15
Comment 20 Jan Engelhardt 2012-03-15 16:25:10 UTC
*** Bug 620635 has been marked as a duplicate of this bug. ***
Comment 22 Kurt Garloff 2014-02-13 16:03:53 UTC
ulimit is gone, as we have better ways to do resource control these days (with cgroups and systemd).
I still think the defaults were reasonable for most installs before, preventing one single application from monopolizing all resources ...
You can adjust the settings in /etc/sysconfig/ulimit in case you have specific needs.