It's really the other way around: we joined Fastly because we knew it's a place where we could do this kind of work in the open.
None of the code involved here existed a year ago, and none of it was somehow forced to be open source. (Also, the code in these two repositories is in many ways the most boring part of this solution. See this blog post for an excellent overview of the more interesting parts: https://bytecodealliance.org/articles/making-javascript-run-...)
While there was no acquisition involved, a whole group of folks working on WebAssembly at Mozilla (myself included) moved to Fastly last Fall. What I tried to emphasize is that instead of the projects at hand here being open source because they somehow had to be, we all joined Fastly because it's a place where this kind of project can be made open source (and be created in the first place!) :)
For languages that can express unforgeable pointers as first-class concept, that is indeed a very attractive, fine-grained approach. Unfortunately bringing that to languages like C/C++/Rust is a different matter altogether.
Since we want to support those languages as first-class citizens, we can't require GC support as a base concept, so we have to treat a nanoprocess as the unit of isolation from the outside.
Once we have GC support, nothing will prevent languages that can use it from expressing finer-grained capabilities even within a nanoprocess, and that seems highly desirable indeed.
(full disclosure: I'm a Mozilla employee and one of the people who set up the Bytecode Alliance.)
That future possibiilty reminds me of https://en.wikipedia.org/wiki/Singularity_(operating_system) - where process/address-space isolation was replaced with fine-grained static verification of high-level code (presumably not the first experiment in this area).
Indeed: that and many other things are prior art in this space. And there is a lot of prior art for what we're working on—this is not meant as an academic research project! :)
Yes, one of the answers I want to give any time someone asks "why will WASM succeed when the JVM didn't" is that there is 25 years more experience and research to draw upon.
And yet bounds checking access validation was left out of the design, something that most of previous research projects took care to taint as unsafe packages when present.
> For languages that can express unforgeable pointers as first-class concept, that is indeed a very attractive, fine-grained approach. Unfortunately bringing that to languages like C/C++/Rust is a different matter altogether.
The semantics of these languages aren’t incompatible with unforgeable references, though: it generally works in practice, but it’s technically undefined to create pointers out of thin air. Why can’t we take advantage of the standard here to disallow illegally created references? (Which, as I understand it, many other vendors are already beginning to do with e.g. pointer authentication and memory tagging.)
What would allow other languages to represent unforgeable pointers as a first class concept and not C/C++/Rust?
Forging a pointer is UB in all of these languages as far as I know.
It seems like you should be able to have opaque types that represent these unforgeable pointers which you can't do arithmetic on or cast to raw pointers, but can access values in type safe ways, or provide a view to a byte slice which does bounds check on access.
Is there a good place for discussion of this design? I seem to be having this conversation with you and Josh both here and on Reddit, and it seems like a lot of the discussion is spread out in a lot of places.
In unsafe rust you can arbitrarily increase the length of a vector/string by modifying the stored length. You do not need to forge the pointer itself to break the pointer's invariant.
You would need to do either static or dynamic bounds checking when accessing memory via these capabilities. You obviously can't just give arbitrary code a pointer and let it read however far it wants past the end of it.
Given that most code in Rust is safe code and includes bounds checks before access, you should be able to have the verifier rely on those when they exist, and add in bounds checks in cases in which the access is not protected by a bounds check.
Maybe that would be intractable, or to inefficient to be worth it with all of the extra bounds checks. I'm not sure. I'm asking because it's something that I feel should be possible, but I haven't been involved in the research or development, so I'm wondering if those who have been more involved have references to discussion about the topic.
That is how we support references in the Rust toolchain right now, via wasm-bindgen, and it's an important part of making unforgable references work for languages that rely on linear memory.
It doesn't help with making capabilities more fine-grained, though: we have to treat all code that has access to that table as having the same level of trust.
To expand on this, capabilities allow us to go further than pledge(2): it enables selective forwarding of capabilities to other nanoprocesses, such as only forwarding a handle to a single file out of a directory, or a read-only handle from a read-write one, etc...
I fear that at the end of the day, capabilities will have the same fate as other sandboxing mechanisms: nobody will use them. And, just so that their application works and avoid support burden, developers will tell people to use a setup that enables access to everything.
pledge(2) and unveil(2) learned from the past and are way simpler. I really wish WebAssembly had adopted similar mechanisms.
Agreed, and there are a lot of UX questions to sort out. Many security concepts took many attempts to figure out in full (or to the extent that they have been figured out :))
One important aspect here is that this doesn't just target whole apps. It also targets developers using dependencies: while it's desirable to restrict an application's capabilities, there's a lot of value in developers only giving packages they depend on very limited sets of capabilities. And that seems much more tractable, given that kitchen-sink packages aren't what most people want to use anyway.
On 1, the libc we're working on[1] is based on musl. It won't ever be 100% compatible with all code, because that runs into constraints imposed by our security goals, but the vast majority of code should eventually just compile when targeting this. (Eventually, because this is all early days.)
On 2, yes, that is explicitly the goals. I'd add that it's not just about OSes, but also about platforms and hardware form factors.
We've mainly based the current design on CloudABI/Capsicum, but it's all early days, and Fuchsia is on our list of systems to at the very least take heavy inspiration from :)
And the layout of structs, strings, etc is up to the compiler, within the bounds of the restrictions WebAssembly imposes.
We'll definitely have a test suite, but this is all early days, so a lot of all that isn't yet in place.
And yes, this can be targeted by LLVM-based and other compilers. In fact, Emscripten could use this as the foundation for their POSIX-like libc and library packages. The syscalls are indeed exposed as Wasm function imports.
/tmp/[DE][AD][BE][EF].txt # ext2 / linux
# OR
C:\stuff\[DEED][FFFE].txt # ntfs / windows
# where [hex] indicates a single filesystem charater with that value
One fun thing about the capability model is that at the system call level, there are no absolute paths. All filesystem path references are relative to base directory handles. So even if an application thinks it wants something in C:\stuff, it's the job of the libraries linked into the application to map that to something that can actually be named. So there's room for the ecosystem to innovate, above the WASI syscall layer, on what "C:\" should mean in an application intending to be portable.
Concerning character encodings, and potentially case sensitivity, the current high-level idea is that paths at the WASI syscall layer will be UTF-8, and WASI implementations will perform translation under the covers as needed. Of course, that doesn't fix everything, but it's a starting point.
That’s good to know, but the parent’s examples seem to be referencing the issue of filenames that aren’t valid Unicode. The Linux example is invalid UTF-8, since Linux filenames are natively arbitrary byte sequences. The Windows example contains an unpaired surrogate followed by the reserved codepoint 0xfffe, since Windows filenames are natively UCS-2.
I have a dollar that says all platform difference issues will be solved by just doing whatever POSIX does and expecting the host OS to figure it out if it isn't already POSIX. Whenever you try to abstract away arbitrarily different implementations while retaining their non-common functionality you either end up reimplementing one of them and expecting the others to work around it, or you end up forcing the programmer to bypass the abstraction anyway and implement logic for each implementation.
I have worked on file APIs. There are so many differences between Windows and Posix that abstracting them away just doesn't work. Undoubtedly, there will eventually be platform-specific APIs that implement one or the other, and cross-platform APIs that implement the intersection.
It's a good question. WASI currently doesn't allow you to set custom access-control permissions when creating files. But we're just getting started, so if we can find a design that works, we can add it.
This is indeed a problem for Wasm/JS integration. The JS WeakRef proposal[1] will address it for many use cases, and the WebAssembly GC proposal[2], combined with JS Typed Objects[3] will address many others. Even before those features will be available, I'd expect the community to iterate on patterns to make APIs nicer.
None of the code involved here existed a year ago, and none of it was somehow forced to be open source. (Also, the code in these two repositories is in many ways the most boring part of this solution. See this blog post for an excellent overview of the more interesting parts: https://bytecodealliance.org/articles/making-javascript-run-...)