This is not memory safe. Having generational pointers to do epoch checking on de...

verdagon · on June 16, 2023

It depends. There are two options of generational references we've been experimenting with:

* Table-based generational references where the generations are stored in a separate table, and we retire overflowed slots. This is guaranteed memory-safe.

* Random generational references which are stochastic and enable storing objects on the stack, and inline in arrays and objects.

The latter's memory safety depends on definitions and baseline:

* It's not as safe as completely memory-safe languages like Typescript/Javascript.

* It's safer than C or C++.

* It could be more safe or less safe than Rust, where most programs use unsafe directly or in (non-stdlib) dependencies [0]. For mutable aliasing, instead of trying to properly use unsafe in Rust, a Vale program would use generational references which are checked (leaving aside the RC capabilities of both languages).

It's a very strong stochastic measure. Whereas simple memory tagging uses 4 bits and has a false positive rate of 6%, these are using 64 bits for a rate close to zero. Combined with how they fail very loudly, invalid-dereference bugs don't lie in stealth for years as they can in unsafe and C programs.

Additionally, one can code in a way that doesn't use any generational references at all. Ada has SPARK, Rust has Ferrocene, and Vale has linear style + regions [1].

Still, if one doesn't want to think about any of this, then safer languages like TS/JS are great options.

You're correct that if a generation leaks then it could be a problem for this C-like's approach. Vale largely solves this by making it so the user code can't read a generation (though like in any language, unsandboxed untrusted code can get around that of course).

[0] https://2020.icse-conferences.org/details/icse-2020-papers/6...

[1] https://verdagon.dev/blog/first-regions-prototype

iudqnolq · on June 17, 2023

> where most programs use unsafe directly or in (non-stdlib) dependencies [0].

Interesting study I hadn't seen before. I think stdblib-vs-external crate is a bit arbitrary of a definition. Rust deliberately chooses to have a relatively small standard library, and there is a core set of special crates written by the similar people and with similar quality to the standard library. Hashbrown is a particularly obvious example: the standard library HashMap and HashSet are a thin wrapper around the external crate, so using the external crate is just as safe.

I'd be interested in a more recent paper that includes miri test coverage.

The authors also have a good caveat that it's 60% of popular libraries, not 60% of codebases. If you browse around crates.io comparing download stats to dependency stats you'll see download stats are far far larger, but that the difference varies wildly depending on the type of crate. Crates that tend to be used by applications but not libraries tend not to be depended on by crates published to crates.io. I'd expect these crates to be less likely to use unsafe. But it's still clearly a lot. (And I think my point above that there isn't a clear stdblib vs external crates distinction cuts the other way as well: there have been soundness bugs in the standard library)

potsandpans · on June 16, 2023

as someone with an interest in the subject but not a domain expert, I have an imprecise understanding of what you both are referring to when you say "a stochastic measure."

do you have any resources that you could point me to for reading up on stochastic memory management?

(apologies if this is a stupid question)

HappMacDonald · on June 17, 2023

IIUC "stochastic" in this case means selected via random or otherwise difficult to predict process. In contrast to Deterministic.

camgunz · on June 16, 2023

Mostly I think you're right, though I don't think this is pointer tagging a la PAC or auto-zeroing (etc.) a la MiraclePtr, this just looks like compiler support for unique_ptr.

I'm with you on the high RAII: it sounds like destructors with some important missing steps. It also looks like the idea of "SOPs must eventually be freed" runs into some immediate Turing problems, i.e. you can't know an SOP will be freed when you have if statements, or a cycle of functions passing an SOP around that conditionally free an SOP depending on some external state, or program behavior contingent on external messaging, etc.

I think another place you'd run into problems with this is... you actually do want references to heap allocated memory. It's pretty laborious to get by with unique_ptr all on its own; you want to lend out shared references that you get back as their scopes close, and so on. It doesn't seem like there's any facility for that here.

To people trying to build systems like this without building Rust's system: honestly just try it in C++. Sure there won't be compiler support for it, so you'll have to manually insert stuff, but just see how workable it is. If you were working with this system in a program of any appreciable complexity you'd immediately realize:

- You don't reliably know when things haven't been freed

- You really want references

---

An interesting alternative way to look at what's proposed here is, this just an oblique message passing system.

sirwhinesalot · on June 16, 2023

By this definition Rust is also not memory safe since you can use unsafe to free a pointer. You normally don't have access to the generation so you can't forge it without unsafe code.

pkulak · on June 16, 2023

But Rust doesn’t claim to be safe if you use unsafe blocks. It’s in the name.

sirwhinesalot · on June 16, 2023

There's no way in Vale to get the generation count of a generational reference without unsafe code, so I really don't see the difference.

tgv · on June 16, 2023

> There doesn't look like there's actually anything preventing the user from calling free themselves

True, but wouldn't that still be memory safe? It seems to me that it could lead to a memory leak, but not to double use of memory. But another annotation to the type so that free() can only be called at one point (or perhaps a few points) might solve that.

chc4 · on June 16, 2023

It doesn't leak memory, but the example use case of "high RAII" isn't about not leaking memory but that "the compiler can ensure you remember to do something in the future and don't accidentally forget it" and "remember to remove something from a cache that you previously added to" - which isn't actually done, because the consumer can call free and the data will still erroneously be in a cache, for example.

dwattttt · on June 16, 2023

An adversary can turn a double free into a use after free by causing an object to be allocated in-between the two free operations at the same address

tgv · on June 16, 2023

You can't free a buffer/object twice in the proposal. After a free, you can't use the pointer any longer.

But it might very well be that this approach is not hard enough for use in e.g. the kernel. It might still be useful for userland applications, though.

obl · on June 16, 2023

This is memory safe if you never re-use a (generational index, address) pair.

They seem to use 64 bit ones so this means that to overflow a single memory cell you would have to allocate and free it every cycle (!) at 4 ghz for about 150 years.

Basically as long as you accept this incredibly small "memory leak" you're fine.

imtringued · on June 16, 2023

By memory leak you mean that every memory location has a limited number of alloc+free and once it passes that, the memory is unusable?

Maybe you should have said, "to overflow the generational index you would have to exclusively allocate and free it with no work being performed for 150 years. Once the overflow happens the only consequence is that this memory cell will never be allocated again."