Thursday, November 23, 2023

Porting the Linux kernel to WebAssembly

Ok, but why?


WebAssembly is an execution sandbox. It has no intrinsic functionality whatsoever. Any kind of interaction with the outside (functions, memory, global variables) has to be imported. There are no sys-calls, there is no standard library. In short, there is an entire OS missing.

Existing solutions try to solve this by providing an application runtime. Nearly all of them focus on the POSIX runtime by either hacking an extending existing libc implementation eg. Emscripten, or writing a new one entirely from scratch eg. Wasix. This generally works OK'ish but has some inherent drawbacks. The most prevalent being the limited portability of existing applications. 

Linux applications are the primary source of potential wasm applications because of their open-source nature. However the POSIX runtime is too limited to what is available on Linux. The POSIX runtimes that are currently available are also not suited for multi-application interaction. Wasix is currently Rust & backend focussed, while Emscripten lacks any kind multi wasm application interaction.


Solution


What if we look at the WebAssembly sandbox as a special virtualised architecture? Because that's almost exactly what it is. We could port the Linux kernel to this new architecture and use the VirtIO framework to make the kernel talk to the outside world. The outside would be your browser, or whatever wasm engine you would run on the server-side.

It would look something like this:
Each application runs in it's own isolated wasm sandbox and basically consist of an application wasm modules and a kernel wasm module. The application module imports it's own application memory, while the kernel module imports both application memory and shared kernel memory. 

[For the uninitiated: a wasm module conains the executable wasm code and has nothing to do with kernel modules in the native world.]

This makes application memory accessible by both the application and kernel, while kernel memory is only accessible to the kernel module. The kernel memory is shared with other kernel wasm modules inside other wasm sandboxes. This make it possible for application interaction to happen through the kernel module which shares it's memory and state with other kernel modules.


Leroy Jenkin' it


Somebody already managed to compile the linux kernel (or rather lkl) to ASM.js using Emscripten. Porting it to wasm should be easy and totally doable! ASM.js is the predecessor of WASM so all that was left to do is change some compilation settings so we get wasm instead of asm.js.

Sadly things aren't that easy. The original ASM.js port of the Linux kernel is based on an older Linux lkl version (4.x) so patches had to be ported to a the newer Linux lkl version (6.x). Next was the switch to wasm. Turns out wasm doesn't like Linux spinlock implementation because of computed gotos which wasm does not support. This can be fixed by implementing wasm own build-in locking mechanism. After some trial-and-error, it was time to take a step back and look at the bigger picture. As it turns out, porting the Linux kernel to wasm requires a lot more work than some minimal patches on top of lkl. 

The wasm standard deviates quite a lot from existing architectures. Wasm modules have no relation with the ELF format, so loading & linking is going to problematic. There is no standard way to dynamically link wasm modules. Computed GOTOs are not supported and a lot of more advances instructions are not available. This effectively calls for wasm to be treated as an entirely new architecture inside the Linux kernel.

First step is to investigate if there is any way we can even link wasm modules. There is an unofficial standard implemented by llvm but sadly there is no stand-alone mapper & loader available. Cool, let's just implement it ourself.

This experiment made some things clear on how and what is needed to link a wasm modules, especially when dealing with multiple threads/processes in the form of web workers. The mapping positions of wasm modules into memory needs to be cached per process while update and retrieval of this information must be thread-safe. 

All of this is already done in some form by the Linux kernel for the ELF executable format, which first maps the file into memory and then hands over execution to the ld.so defined in the ELF file itself. The Linux kernel gives us some handles in the form of binfmt_misc to load arbitrary executable formats but more investigation is required if this is enough to load wasm modules.

This exploration barely scratches the surfaces but already reveals some things that will be needed to port the Linux kernel to WASM.

edit 3 September 2024: An exciting and ongoing linux kernel port to WASM can be found here: https://github.com/tombl/linux

To be continued.

Porting the Linux kernel to WebAssembly

Ok, but why? WebAssembly is an execution sandbox. It has no intrinsic functionality whatsoever. Any kind of interaction with the outside (fu...