This is not memory safe. Having generational pointers to do epoch checking on dereference is a stochastic mitigation. It is equivalent to PAC or Chromium MiraclePtr but with even less defense against adversarial attackers, because any infoleak of the generation allows for forging of UaF pointers.
The "high RAII" concept is also not actually misuse proof. There doesn't look like there's actually anything preventing the user from calling free themselves instead of RemoveShipFromDisplayCache. The types are linear, but the final sink is always free which doesn't encode the full usage semantics.
It depends. There are two options of generational references we've been experimenting with:
* Table-based generational references where the generations are stored in a separate table, and we retire overflowed slots. This is guaranteed memory-safe.
* Random generational references which are stochastic and enable storing objects on the stack, and inline in arrays and objects.
The latter's memory safety depends on definitions and baseline:
* It's not as safe as completely memory-safe languages like Typescript/Javascript.
* It's safer than C or C++.
* It could be more safe or less safe than Rust, where most programs use unsafe directly or in (non-stdlib) dependencies [0]. For mutable aliasing, instead of trying to properly use unsafe in Rust, a Vale program would use generational references which are checked (leaving aside the RC capabilities of both languages).
It's a very strong stochastic measure. Whereas simple memory tagging uses 4 bits and has a false positive rate of 6%, these are using 64 bits for a rate close to zero. Combined with how they fail very loudly, invalid-dereference bugs don't lie in stealth for years as they can in unsafe and C programs.
Additionally, one can code in a way that doesn't use any generational references at all. Ada has SPARK, Rust has Ferrocene, and Vale has linear style + regions [1].
Still, if one doesn't want to think about any of this, then safer languages like TS/JS are great options.
You're correct that if a generation leaks then it could be a problem for this C-like's approach. Vale largely solves this by making it so the user code can't read a generation (though like in any language, unsandboxed untrusted code can get around that of course).
> where most programs use unsafe directly or in (non-stdlib) dependencies [0].
Interesting study I hadn't seen before. I think stdblib-vs-external crate is a bit arbitrary of a definition. Rust deliberately chooses to have a relatively small standard library, and there is a core set of special crates written by the similar people and with similar quality to the standard library. Hashbrown is a particularly obvious example: the standard library HashMap and HashSet are a thin wrapper around the external crate, so using the external crate is just as safe.
I'd be interested in a more recent paper that includes miri test coverage.
The authors also have a good caveat that it's 60% of popular libraries, not 60% of codebases. If you browse around crates.io comparing download stats to dependency stats you'll see download stats are far far larger, but that the difference varies wildly depending on the type of crate. Crates that tend to be used by applications but not libraries tend not to be depended on by crates published to crates.io. I'd expect these crates to be less likely to use unsafe. But it's still clearly a lot. (And I think my point above that there isn't a clear stdblib vs external crates distinction cuts the other way as well: there have been soundness bugs in the standard library)
as someone with an interest in the subject but not a domain expert, I have an imprecise understanding of what you both are referring to when you say "a stochastic measure."
do you have any resources that you could point me to for reading up on stochastic memory management?
Mostly I think you're right, though I don't think this is pointer tagging a la PAC or auto-zeroing (etc.) a la MiraclePtr, this just looks like compiler support for unique_ptr.
I'm with you on the high RAII: it sounds like destructors with some important missing steps. It also looks like the idea of "SOPs must eventually be freed" runs into some immediate Turing problems, i.e. you can't know an SOP will be freed when you have if statements, or a cycle of functions passing an SOP around that conditionally free an SOP depending on some external state, or program behavior contingent on external messaging, etc.
I think another place you'd run into problems with this is... you actually do want references to heap allocated memory. It's pretty laborious to get by with unique_ptr all on its own; you want to lend out shared references that you get back as their scopes close, and so on. It doesn't seem like there's any facility for that here.
To people trying to build systems like this without building Rust's system: honestly just try it in C++. Sure there won't be compiler support for it, so you'll have to manually insert stuff, but just see how workable it is. If you were working with this system in a program of any appreciable complexity you'd immediately realize:
- You don't reliably know when things haven't been freed
- You really want references
---
An interesting alternative way to look at what's proposed here is, this just an oblique message passing system.
By this definition Rust is also not memory safe since you can use unsafe to free a pointer. You normally don't have access to the generation so you can't forge it without unsafe code.
> There doesn't look like there's actually anything preventing the user from calling free themselves
True, but wouldn't that still be memory safe? It seems to me that it could lead to a memory leak, but not to double use of memory. But another annotation to the type so that free() can only be called at one point (or perhaps a few points) might solve that.
It doesn't leak memory, but the example use case of "high RAII" isn't about not leaking memory but that "the compiler can ensure you remember to do something in the future and don't accidentally forget it" and "remember to remove something from a cache that you previously added to" - which isn't actually done, because the consumer can call free and the data will still erroneously be in a cache, for example.
You can't free a buffer/object twice in the proposal. After a free, you can't use the pointer any longer.
But it might very well be that this approach is not hard enough for use in e.g. the kernel. It might still be useful for userland applications, though.
This is memory safe if you never re-use a (generational index, address) pair.
They seem to use 64 bit ones so this means that to overflow a single memory cell you would have to allocate and free it every cycle (!) at 4 ghz for about 150 years.
Basically as long as you accept this incredibly small "memory leak" you're fine.
By memory leak you mean that every memory location has a limited number of alloc+free and once it passes that, the memory is unusable?
Maybe you should have said, "to overflow the generational index you would have to exclusively allocate and free it with no work being performed for 150 years. Once the overflow happens the only consequence is that this memory cell will never be allocated again."
The "high RAII" concept is also not actually misuse proof. There doesn't look like there's actually anything preventing the user from calling free themselves instead of RemoveShipFromDisplayCache. The types are linear, but the final sink is always free which doesn't encode the full usage semantics.