I found this video https://www.youtube.com/watch?v=r-A78RgMhZU "A Talk Near the Future of Python (a.k.a., Dave live-codes a WebAssembly Interpreter)" to be a brilliant introduction to WASM as well as writing interpreters in general. I'm a relative novice in the subject and it was pitched right at my level.
Dave Beazley is a truly amazing presenter. Another favorite of his of mine is his epic tale of how he ended up demolishing an opposing side's case in a civil lawsuit... by sheer luck of having a Python interpreter avaialable. For for you viewing pleasure:
Very nice. I like these tutorials showing the nuts and bolts of wasm and C without just throwing it at emscripten toolchain.
I'm curious if there's perf differences between canvas and webgl canvas. This project uses just canvas, but iirc passing frames to be rendered by webgl is faster. Perhaps I'm wrong in this context.
I also don't see threading in here. Makes sense for a demo, but if this were to be used performantly you'd have to throw it all in a webworker so it doesn't block the main thread. This is one point of contention with wasm because it's not straightforward to render to a canvas/webgl on the main thread from a worker thread. OffscreenCanvas is one workaround but not supported by FF or safari.
Seems to me the original rendering pipeline wasn't GPU based. The author is just dumping whatever the game renders in a canvas element, there is no need for webgl for that.
Also Canvas is getting some GPU acceleration in some browsers:
There's also the problem of getting keyboard input in and out of the web worker in a performant manner.
I tried this a few years ago with a Gameboy emulator I had ported from Go to webassembly and used web workers to run the emulator in.
Getting the keyboard input in, in a performant way was a real struggle using postMessage, although I'll admit I'm not the best at web programming so someone more skilled might have been able to do it better
I'd ported the emulator to WASM from one I'd half written a few years previously, and engineered myself into a corner (+ like I said, not a web programmer)
The emulator ran in the worker, and the inputs were handled on the main thread.
The output (i.e. display) was pushed from the worker -> main thread
I think if I was to do it from scratch I'd do something similar to the DooM approach here and use renderAnimationFrame etc
The web worker where wasm is running in would be the one that handles the whole game, and the game has to be told the inputs. Inputs are only grab able on the main thread, and rendering is only done on the main thread. If you run the game in the main thread too you can end up with your browser tab becoming unresponsive, or have delays in the input and rendering that affect the games performance.
It's the same as when you make a python UI all in one thread, you can't receive user input and also do some long task at the same time.
From what I found on MDN "a side effect to the block in one agent will eventually become visible in the other agent", what does the word eventually mean there, what's going on under the hood?
> Doom has a global variable screens[0] which is a byte array of SCREENWIDTH*SCREENHEIGHT, i.e. 320x200 with the current screen contents.
It would seem to me that the right approach would be to hoist out Doom's main loop so you just have a renderFrame() function, then put something on the main browser thread to "blit" the image into the canvas itself.
It's incredible how far the web has come. I remember the first time I saw a browser GameBoy emulator and I was amazed. Maybe I should port my GB emulator to WASM...
For one thing, I’m unlikely to download a native copy of Doom to run on my own machine from a strange website. The ability to run cross-platform code that uses my GPU in a secure sandbox is pretty neat to me.
This is exactly the kind of tutorial I've been waiting for for years. The way blocks and breaks work is especially non-intuitive if you are used to either assembly or regular languages, and you START with it. Good work, really loving this tutorial!
I love seeing this kind of tutorial, that isn't just a step-by-step guide, but also an exploration of the thought process and trial-and-error that goes on in crafting each step, so thanks for sharing.
Looks like a lot of the work on the Doom port (https://github.com/diekmann/wasm-fizzbuzz/tree/main/doom) is about getting common functions from the C standard library to work in WASM. Surely this seems like a good opportunity for a new Free Software initiative - something optimized, properly licensed/credited and easy for everybody to use?
I can't tell if the lack of strings and DOM API interop in web assembly is on purpose or not.
If it is on purpose, what an absolutely diabolical way to ensure javascript language dominance in the browser: give people a way to port their language to the browser, but make it incredibly difficult to do anything.
Afaik for the WebAssembly MVP, the goal was to have a simple, efficient compile target - therefore only integers and floats. To make wasm more useful & easier to integrate, the plan calls for interface types[0], which allow both accessing complex (JS) objects and calling browser APIs.
What type of strings though? Exposing Javascript string objects in WASM doesn't make much sense if the code is expecting C strings for instance. Same for other languages, those all have their own incompatible internal representations for strings. The only somewhat interop-friendly string type is a zero-terminated bag of bytes, usually UTF-8 encoded (aka C strings), but that's a different string representation than Javascript uses.
The Emscripten SDK offers helper functions to marshal high level data types like Javascript strings to UTF-8 encoded C strings on the WASM heap and back to JS, so it's not that bad.
DOM access can be achieved with helper libraries which call out into JS. And since any sort of DOM manipulation is extremely slow anyway there's not much of a performance difference even with the overhead of calling out from WASM into JS (which actually is quite fast nowadays).
This would still require conversion from and to Javascript strings, and doesn't help with any language compiled to WASM that isn't Rust. And it probably wouldn't even help Rust because such a native WASM string type would presumably live outside the WASM heap (because if the string data would be on the WASM heap, there's no need for a native string type).
I was arguing against the idea of "string are implemented differently by different languages, so WHICH one choose?".
A safe one. Your sample is that - except I think is better if is a utf-8 string, but this one works for me too-. What will be worrisome is if is made to be like in C.
Rust (or C++) strings are not pascal strings. In pascal strings, the "string buffer" also contains the length information, and historically it was all bytes with a length byte at the start, which was why your strings started at index 1 and limited to 254 bytes.
It's possible to modernise this style of strings to be less crummy (that is essentially what sds does), but C++/Rust string are a third take where the length (and capacity) are stored separately from the string buffer, and that buffer is always on the other side of a pointer (ignoring SSO, which Rust sadly doesn't have due to the original interface definition).
I think if you're afraid of memory corruption inside the WASM heap, it's better to use Rust instead of C or C++. WASM's job is to prevent code inside the sandbox from escaping the sandbox, not to prevent memory corruption inside the sandbox.
When they are finished with the WebAssembly roadmap there will be the same sandbox as a typical OS process, no different of running a ART executable on Android, bitcode on watchOS, MSIL on Windows, or TIMI on IBM i.
Of course, that is the plan, but even then it will still be possible to run WebAssebly modules with no permissions or limited permissions, as the sandbox was always there.
On the other hand I need to admit that I would have not forseen some of the more recent use cases for WebAssembly
It is "slow" relative to the overhead of calling from WASM into JS to manipulate the DOM from JS.
To be fair, I don't know how many clock cycles creating, destroying or modifying a DOM node costs on average, but most likely "a lot" compared to the overhead of a WASM to JS call because a lot more machinery is involved.
The difficulty is inherent; C, C++ and so on live in a very different world to JavaScript. Whether or not WebAssembly had direct interaction with JavaScript objects at launch or not, writing bridging code would still be tedious.
But there's no reason you must write this yourself. Others have done the hard work for you and written libraries.
That was the original message used to sell WebAssembly, however when the real goal is to replace ActiveX, Flash, Silverlight and PNaCL it was obvious that it would grew beyond that.
really? i thought it's purpose was to take us back to the good old days of sellable proprietary binary blobs instead of the more open HTML/JS/CSS stack.
The lack of strings makes sense, as many different languages and standard libraries have their own implementations of it, that can behave slightly differently. It now puts the implementation of the string to the compilers/linkers, as is generally the case for assembly as well.
The lack of a DOM API is something I sorely miss as well. It's currently possible (and not that hard, you can just interact with JS), but comes with such performance overhead that you lose the entire benefit of WASM.
WASM is supposed to be 'assembly' level a little bit like java bytecodes. So it's lower level than 'strings'.
But as you have pointed out, the missing layer on top i.e. the 'thing we can practically use' is a big gaping hole and it's a little bit diabolical.
The fact that JS has gotten so much faster and the lack of both higher-level abstractions and notably a really good 'bridge' to JS means it's lagged in terms of material applicability.
It's also still missing proper garbage collection, meaning languages like C# have to include basically the entire runtime if you compile to WebAssembly. This is a major part of why Blazor apps in .NET 5 are ~2MB for a simple "Hello World" (closer to 8MB if you use the AOT compilation options in the .NET 6 preview).
Why would WASM have garbage collection? It's an assembly target, not a runtime. What if languages would want different memory management strategies?
I know it's an existing proposal for WASM, but it feels so massively out of scope. If the issue is having to include runtimes in the WASM binary it might be more useful to think about how we could serve runtimes in a more efficient way.
I feel the same way. I find it very odd that GC is something that WASM ever intends to think about. If shipping your entire runtime sucks, find a smaller runtime?
The problem is that each runtime has different GC requirements, so at best it will mean WASM GC semantics will be the underlying JS GC semantics, probably not what you want for a D or .NET GC, for example.
No one has ever articulated the details of what they mean by these runtimes having different GC requirements. JavaScript garbage collection has no "semantics"--it is entirely invisible to applications. Even WeakMap and WeakSet do not expose garbage collection details because they are not iterable.
The memory profile of JavaScript applications tends to look a lot like the memory profile of typical Java applications. It tends to be a law of large numbers.
Now if you want to talk about details of how we implement runtimes that do have observable GC details, like weak callbacks, Java's zoo of reference types, etc, then let's do that, because Wasm GC will eventually need to have low-level mechanisms to support those.
But if we're talking about a Wasm engine GC's ability to allocate, trace, move (or not!) little blocks of memory around, then I don't see any fundamental stumbling blocks to making that mechanism efficient and universal.
For example, the existing JS GC doesn't need to expose ability to stop the GC, execute on demand, support value types, pinning memory, interior pointers, control GC regions, marshaling to native code to the developer, whereas a .NET or D GC does.
So if a future WASM GC doesn't offer APIs for such capabilities, it is useless from those runtimes point of view.
Thanks for getting down to brass tacks. Of the things that you mentioned, I think that interior pointers are the only thing that is relevant.
Java has an API for executing the GC on demand, and VM engineers I have talked to over the years think it's a knob that apps shouldn't have.
Wasm already supports multiple return values, so you don't need to box value types on any boundary--they can be flattened whereever they occur.
Pinning memory has to do with interfacing native code that could potentially do unsafe things. That doesn't fit into wasm's model, and would only be necessary for interacting with platform APIs, which are being designed not to need that. Same for "marshalling to native code".
I don't understand what you mean by GC regions. Realtime Java had GC regions and a complex system for trying to allow threads to run without touching the heap. It really didn't go well. I think if regions are useful for a GC, the engine should do inference of them, because adding regions to the type system infects everything.
I would assume that there’s Microsoft folks involved to make sure it works out satisfactory given their investment in Blazor, but yes, it’s always a possibility that an API is bad. I don’t know what their level of interest is in embedding wasm inside C# is.
That was just an example, there are a plethora from GC algorithms to chose from, which of them needs to be fine tuned for the specific runtime it is to be applied, if performance is of any concern to the language implementers.
I have worked on a number of runtimes and it is not generally the case that a GC needs to be "tuned" for a runtime, rather that a GC co-evolves with a runtime and features or misfeatures of the runtime determine the path of least resistance for developing more advanced GC algorithms. The interplay tends to involve a lot of technical debt if the separation is poor from the outset. But regardless, it's rare that a runtime develops more than a couple GC algorithms unless it has a very long lifetime or is explicitly designed to allow swappable GCs, like Jikes RVM with Mmtk.
GC performance depends more on the program than the language.
But regardless, the hardest parts of getting to advanced GCs, such as concurrent and parallel algorithms are usually very deep assumptions of single-threadedness and uninterruptibility that are debt in the runtime. It usually doesn't help that most runtimes are written in C/C++ and suffer that environment's complete uncooperativeness[1] in finding and manipulating roots.
[1] To the point of seeming hostility. It's been how many years and LLVM still fights against supporting stack maps?
Interesting that Firefox by default did not render the fizzbuzz demo correctly by default.
I had to click the canvas icon next to address bar and allow the canvas usage.
And it did not show a prompt either. It just looked broken by default.
...Good! The "prompt" permission model is fundamentally broken, because all it does is train you to click through the prompt.
The "click the blocking button and turn it off" model is much better. It still trains you to turn off blocking when something is broken. However, crucially, that's only when it's broken. When it's not broken, you just use the site, instead of habitually clicking through the permission prompt that's just harvesting data, not actually needed to function.
And yes, malicious sites can of course display themselves as falsely broken until you grant the permissions. But this makes them more annoying to use, granting a UX edge to the honest sites which don't request unnecessary permissions. In other words, the incentives of sites and users are more aligned.
"Then, I threw out everything which is either not needed or looks complicated. We only need the string formatting functions anyway, let's remove everything else. Th result is a crossover of musl 1.2.2 and arch from emscripten for musl 1.1.15. YOLO!"
sorry if the answer is to read the whole series (i only read part 4), but is there a comparison of this hand-optimized route vs what emscripten outputs (in terms of binary size an browser perf)?
i assume a proper emscripten comparison would also need to strip networking & audio output.