Bug 1172897

Summary: VirtualBox kernel modules will not run with kernel 5.8
Product: [openSUSE] openSUSE Tumbleweed Reporter: Larry Finger <Larry.Finger>
Component: Virtualization:OtherAssignee: Larry Finger <Larry.Finger>
Status: RESOLVED DUPLICATE QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P5 - None CC: dfaggioli, hpj, jslaby, mhocko, mkubecek, vbabka
Version: Current   
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE Factory   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: Changes needed for kernel 5.8 for patch to module_memory to work

Description Larry Finger 2020-06-14 17:16:35 UTC
Created attachment 838772 [details]
Changes needed for kernel 5.8 for patch to module_memory to work

There have been a number of fixes needed to build vboxdrv with kernel 5.8. The changes handle the following differences in the kernel API:

1. In struct mm_struct, member mmap_sem was renamed to mmap_lock.
2. The information in cpu_tlbstate is no longer exported.
3. The routines __get_vm_area() and map_vm_area() no longer exist and their
   replacements are not exported. Two fixes have been attempted:
   a. The missing routines were not available until kernel 2.6.23, thus the code
      was changed to revert back to the "old" method. Unfortunately, this did not
      work, and likely it will require Oracle to make the changes.
   b. The replacements for __get_vm_area() and map_vm_area() are implemented. The
      resulting code builds but gets missing globals on loading. For testing, my
      kernel was modified as shown in the attached patched.
      This change cannot be permanent, but it can be temporary.
Comment 1 Larry Finger 2020-07-06 21:39:13 UTC
I assume that the continuing failure to build VB on Kernel_HEAD_standard means that you have decided against making the special kernel changes proposed in my patch.

I have been unable to find a work-around, thus I will soon push a patch that will allow VB to build with kernel 5.8; however, it will not run. Once 5.8 is released and becomes the default kernel in Tumbleweed, users will be forced to switch to use KVM rather than VB.
Comment 2 Jiri Slaby 2020-07-09 09:17:20 UTC
(In reply to Larry Finger from comment #1)
> I assume that the continuing failure to build VB on Kernel_HEAD_standard
> means that you have decided against making the special kernel changes
> proposed in my patch.

Not really, I personally haven't managed to get into it. The same would hold for Michal, I suppose.

I still think that vmalloc_module (IIRC the function name) should work. As what VB does is only working around vmalloc_module was never exported, while its callees were. So exporting vmalloc_module should be the way to go...
Comment 3 Larry Finger 2020-07-09 19:12:39 UTC
(In reply to Jiri Slaby from comment #2)
> I still think that vmalloc_module (IIRC the function name) should work. As
> what VB does is only working around vmalloc_module was never exported, while
> its callees were. So exporting vmalloc_module should be the way to go...

There is no vmalloc_module() in the kernel. The names of interesting exported routines are vzalloc_node(), and alloc_vm_area(). I thought routine __get_vm_area_caller() could be replaced with alloc_vm_area(); however, that way only gets I/O memory. Similarly, vzalloc_node() gets memory in the VMALLOC range, not the MODULE range. An export of __vmalloc_node_range() would probably work.

To recreate map_kernel_range() would require duplicating a lot of the memory management code. For example, I had to copy 140 lines of code from mm/vmalloc.c to get the code to compile, and that left 5 undefined symbols when depmod was run.

It certainly appears that we will need to export both __get_vm_area_caller or __vmalloc_node_range, and map_kernel_range().
Comment 4 Jiri Slaby 2020-07-10 05:09:48 UTC
Just found it out. I was thinking about exporting and using module_alloc.
Comment 5 Jiri Slaby 2020-07-10 11:04:42 UTC
(In reply to Larry Finger from comment #3)
> It certainly appears that we will need to export both __get_vm_area_caller
> or __vmalloc_node_range, and map_kernel_range().

Let's get our mm guys into the loop. VB here needs to allocate EXECutable pages -- it currently opencodes __vmalloc_node_range. So it needs all the callees to be exported which changed in 5.8. IMO what it wants is something like module_alloc. Or __vmalloc_node_range. (Either of that exported.) COuld you anticipate whether exporting any of those would be accepted in upstream?
Comment 6 Larry Finger 2020-07-10 14:40:53 UTC
I have deliberately not contacted upstream. For the last few cycles, it is clear that they do not want any external code acquiring executable memory. With earlier changes, they left enough wiggle room that it was possible to get VB to run, but with 5.8, they shut the door rather tightly. You may have enough clout to change their minds - I know I do not.

As I said in the initial E-mail exchange, Oracle is going to need to address this themselves. If we add those two exports to our kernel, we will be able to run VB with 5.8, and be the only Linux distro that can.

I think that module_alloc() could be made to work. The only problem would be the conversion of the void * that it returns into a struct vm_struct * that VB needs.
Comment 7 Michal Hocko 2020-07-13 07:25:43 UTC
Sorry for being late here. So the upstream situation is indeed as Larry describes it. There is a huge opposition to any interfaces which allow executable mappings. Virtual Box has to find a way around that. Is anybody at Oracle working on that? Because they have to somehow deal with that as well.
Comment 8 Larry Finger 2020-07-13 19:15:01 UTC
Oracle is aware of the problem as I posted about it on the developers mailing list and a defect (Bugzilla) entry was created. No response from them.

The main question here is do we wish to break all VB hosts on TW when kernel 5.8.0 becomes the default kernel?
Comment 9 Michal Hocko 2020-07-14 07:00:51 UTC
(In reply to Larry Finger from comment #8)
[...]
> The main question here is do we wish to break all VB hosts on TW when kernel
> 5.8.0 becomes the default kernel?

I believe this is for Oracle to get resolved. I do not think we want to diverge from the upstream kernel here.
Comment 10 Larry Finger 2020-07-14 13:56:23 UTC
That is fine. I will be able to point all the user complaints and bug reports here.
Comment 11 Michal Hocko 2020-07-14 14:49:06 UTC
(In reply to Larry Finger from comment #10)
> That is fine. I will be able to point all the user complaints and bug
> reports here.

I would recommend pointing those complains at Oracle who owns the code and is able to do something about that.
Comment 12 Hans-Peter Jansen 2020-08-07 09:58:21 UTC
Dear Larry,

sorry to see you in such troubled waters. 

It would be nice to add a reference to the Oracle bug in question, though.

@Jiri, Michal: Since 5.8 is expected to enter TW soon, do I really need to take precautions to not let 5.8 let slip though, since I strongly depend on an operational VB in various (iow. most) TW installations, that I manage out there?
Comment 13 Michal Hocko 2020-08-07 10:22:55 UTC
(In reply to Hans-Peter Jansen from comment #12)
> Dear Larry,
> 
> sorry to see you in such troubled waters. 
> 
> It would be nice to add a reference to the Oracle bug in question, though.
> 
> @Jiri, Michal: Since 5.8 is expected to enter TW soon, do I really need to
> take precautions to not let 5.8 let slip though, since I strongly depend on
> an operational VB in various (iow. most) TW installations, that I manage out
> there?

I believe you should talk to Oracle to update their driver.
Comment 14 Hans-Peter Jansen 2020-08-07 12:32:36 UTC
The persons in charge @Oracle always make me feel like reporting issues is an unpleasant disturbance. Consequently, they tend to brush them off in an unkindly way.

I wouldn't even blame the teams behind those projects, but obviously, the Oracle Management seems to put pressure everywhere, rather than establishing a successful company culture, where people feel good and like, what they do.

Needless to say, that this doesn't work well in open source projects, cough, OpenOffice, cough.

If somebody has a pointer, I'm willing to follow upstream development, but will refrain from interacting with it if possible. 

If you guys refrain from accepting Larry's patch as a temporary measurement to deal with this nuisance, which is understandable from some certain POV, my best option seems to be rolling Larry's patch into a custom kernel build.
Comment 15 Jiri Slaby 2020-08-10 05:55:07 UTC
The "upstream" bug:
https://www.virtualbox.org/ticket/19644

There are various comments like:
> Well, I can confirm that the version VirtualBox-6.1.97-139689-Linux_amd64 runs just fine with Kernel 5.8_RC7 on Ubuntu MATE 20.04

So it might be fixed in some version of vbox already.
Comment 16 Jiri Slaby 2020-08-10 06:17:33 UTC
(In reply to Hans-Peter Jansen from comment #14)
> my best option seems to be rolling Larry's patch into a custom kernel build.

Larry builds such a kernel somewhere IIUC.

(In reply to Hans-Peter Jansen from comment #12)
> @Jiri, Michal: Since 5.8 is expected to enter TW soon, do I really need to

FWIW 5.8 already reached factory and will be released as soon as QA is finished.
Comment 17 Larry Finger 2020-08-12 19:36:39 UTC
See bsc#1175201 for a set of fixes.

*** This bug has been marked as a duplicate of bug 1175201 ***