So why is `unsafe` even in the language, then? It would seem to me that it simultaneously undermines safety guarantees, and it can't even ensure performance improvements.
Because performance improvements are not why `unsafe` exists. It exists to disable various conservative compile-time checks in order to let you write code that the compiler is unable to prove correctness properties about.
One reason to do this might be performance. An example: if you can prove lifetime properties but the compiler can’t, you might be able to just use raw pointers without the borrow checker, which is unsafe. This can let you avoid the performance penalty of smart pointers or cloning.
But there are other reasons to do it, too. Some things simply can’t be reasoned about by the compiler, like calling external C code, so they always need `unsafe`.
> It exists to disable various conservative compile-time checks
There is a subtle but important distinction here. Unsafe does not disable any compile time checks. It adds additional constructs which are not checked.
Thank you, you are right. It enables certain constructs, e.g. dereferencing raw pointers, which have fewer compile-time checks than safe constructs, e.g. references.
At some point a Rust program does have to run on a real platform. Anything interacting with that platform will be `unsafe` because Rust cannot reason about things outside itself. You'd usually wrap these unsafe things in safe functions. Note this isn't about performance at all.
Also in Rust's model there are only unique or shared pointers (and the owned resource that they point to). This can't be used to represent all data structures. You need some unsafe code (ideally wrapped as Safe structures) to create these structures.
And as noted, `unsafe` can be used for performance improvements if benchmarking shows that breaking out of Rust's model does indeed improve performance. But you can't simply assume safe Rust incurs a performance hit or that your unsafe code will optimize better than the safe code.
In short, my point was one of emphasis and framing. When thinking about `unsafe` it shouldn't be thought of as a performance feature per se. That may be one effect of using certain features in certain situations but its an effect that needs to be proved by robust benchmarking.
`unsafe` is in the language for compatibility, not performance. Rust programs run between countless other chunks of software, most of which do not have comparable safety features, same as the communication channels inbetween. But you still want Rust programs to be fully compatible with that "outside world" when needed.
Using 'unsafe' isn't about improving performance, though it may be required to enable some performance improvements. It allows you to use constructs that the Rust safety checking infrastructure can't reason about. Memory used as a DMA target (ubiquitous in database engines) is an example.
Unsafe can't ensure performance improvements, but you can sometimes get performance boosts using it.
It should be a last resort if you have a bottle neck that is a result of memory restrictions.
It's also how FFI works. If you're calling a c library, you cannot guarantee memory safety anymore since its outside of rusts domain, so it needs to be unsafe.
> Unsafe can't ensure performance improvements, but you can sometimes get performance boosts using it.
Maybe to back this up with a concrete example, let's say I'm parsing something and I have a &[u8] (a string of bytes) which I have already verified contains only ASCII digits. I wanted to get the numerical value of that number. I don't want to write my own number parsing code, but std only has a parsing function for &str (UTF-8 strings), not for &[u8]. I could do
let string = str::from_utf8(bytes);
let value: u64 = string.parse()?;
But then str::from_utf8 would iterate over the bytestring again to verify that it's valid UTF-8. This check is useless because I already know the string only contains ASCII digits. So in this case, I can improve performance with an unsafe block:
//SAFETY: `bytes` was already proven to only contain ASCII digits
let string = unsafe { str::from_utf8_unchecked(bytes) };
let value: u64 = string.parse()?;
The performance gain comes not from unsafe per se, it comes from using a different function that skips the UTF-8 check. Since this function does not guarantee on its own that str's invariants are upheld, it's marked unsafe.
The intention is to extract as much valid &str from the given buf as possible, and only fail if there is no more valid &str to read from the buf. Unfortunately std::str::from_utf8's error does not also yield a &str corresponding to the part that did parse successfully ( * ), so I have to compute it myself. Using std::str::from_utf8 for this second pass unfortunately does end up verifying the UTF8-ness of the buf again, as verified from the asm, because the compiler isn't sufficiently smart. Similarly slicing buf with valid_up_to safely also reruns the bounds check, which is why the code also uses get_unchecked for that instead.
( * ) Nothing prevents it from doing that, and one of these days I might get around to PRing it.
Isn't this a bit fragile though? It's totally fine as long as you own and understand the code. But what's to stop someone new and/or inexperienced from coming along and violating the 'already proven' claim? How will you know?
Isn't there a risk of Rust code having "Don't ever change this" comments?
That's a general problem of coding, that the next edit can violate an invariant. The advantage with Rust is that, for several types of invariants, you only need to check places with an unsafe block during code review or audits. Everywhere else, the compiler upholds these invariants for you.
Why not lift the encoding to the type system? The type system could remember which strings are ASCII and which are UTF-8. More generally, why not lift such proofs to types?
If you're trying to print "hello world" on the screen, you need to make a system call. Doing so is unsafe, by Rust's definition.
There are two main solutions to this:
1. have some sort of "unsafe" language construct so that you can make syscalls yourself.
2. Disallow arbitrary syscalls. put the code for making said syscall inside of some trusted code, whether that be the the runtime or the standard library.
Rust has chosen option #1. Java has chosen #2.
(Also, virtually every language has an "unsafe mode" in whatever C FFI exists. I was referring to sun.misc.Unsafe earlier, but JNI still exists, of course.)
To rephrase my point: including an unsafe mode is an option when designing a language. Your original comment appeared be suggesting that it was a necessity, and that one could not write a Hello World program in a language that lacks an unsafe mode. Which is not the case, of course. JavaScript, for instance.
unsafe lets you exit the small subset of code patterns whose safety the compiler can prove.
Not everything that you write in unsafe is necessarily unsafe: it only means that the compiler can't prove it. Another way of thinking: if unsafe let you only write unsafe code, then it would be useless because that unsafe code would eventually segfault.