So why is `unsafe` even in the language, then? It would seem to me that it simul...

umanwizard · on April 24, 2020

Because performance improvements are not why `unsafe` exists. It exists to disable various conservative compile-time checks in order to let you write code that the compiler is unable to prove correctness properties about.

One reason to do this might be performance. An example: if you can prove lifetime properties but the compiler can’t, you might be able to just use raw pointers without the borrow checker, which is unsafe. This can let you avoid the performance penalty of smart pointers or cloning.

But there are other reasons to do it, too. Some things simply can’t be reasoned about by the compiler, like calling external C code, so they always need `unsafe`.

steveklabnik · on April 24, 2020

> It exists to disable various conservative compile-time checks

There is a subtle but important distinction here. Unsafe does not disable any compile time checks. It adds additional constructs which are not checked.

umanwizard · on April 24, 2020

Thank you, you are right. It enables certain constructs, e.g. dereferencing raw pointers, which have fewer compile-time checks than safe constructs, e.g. references.

ChrisSD · on April 24, 2020

At some point a Rust program does have to run on a real platform. Anything interacting with that platform will be `unsafe` because Rust cannot reason about things outside itself. You'd usually wrap these unsafe things in safe functions. Note this isn't about performance at all.

Also in Rust's model there are only unique or shared pointers (and the owned resource that they point to). This can't be used to represent all data structures. You need some unsafe code (ideally wrapped as Safe structures) to create these structures.

And as noted, `unsafe` can be used for performance improvements if benchmarking shows that breaking out of Rust's model does indeed improve performance. But you can't simply assume safe Rust incurs a performance hit or that your unsafe code will optimize better than the safe code.

In short, my point was one of emphasis and framing. When thinking about `unsafe` it shouldn't be thought of as a performance feature per se. That may be one effect of using certain features in certain situations but its an effect that needs to be proved by robust benchmarking.

alpaca128 · on April 24, 2020

`unsafe` is in the language for compatibility, not performance. Rust programs run between countless other chunks of software, most of which do not have comparable safety features, same as the communication channels inbetween. But you still want Rust programs to be fully compatible with that "outside world" when needed.

jandrewrogers · on April 24, 2020

Using 'unsafe' isn't about improving performance, though it may be required to enable some performance improvements. It allows you to use constructs that the Rust safety checking infrastructure can't reason about. Memory used as a DMA target (ubiquitous in database engines) is an example.

Sammi · on April 24, 2020

There's a great new talk by on Gjengset on youtube that tackles just this question:

Rust NYC: Jon Gjengset - Demystifying unsafe code

https://www.youtube.com/watch?v=QAz-maaH0KM

conradludgate · on April 24, 2020

Unsafe can't ensure performance improvements, but you can sometimes get performance boosts using it.

It should be a last resort if you have a bottle neck that is a result of memory restrictions.

It's also how FFI works. If you're calling a c library, you cannot guarantee memory safety anymore since its outside of rusts domain, so it needs to be unsafe.

majewsky · on April 24, 2020

> Unsafe can't ensure performance improvements, but you can sometimes get performance boosts using it.

Maybe to back this up with a concrete example, let's say I'm parsing something and I have a &[u8] (a string of bytes) which I have already verified contains only ASCII digits. I wanted to get the numerical value of that number. I don't want to write my own number parsing code, but std only has a parsing function for &str (UTF-8 strings), not for &[u8]. I could do

  let string = str::from_utf8(bytes);
  let value: u64 = string.parse()?;

But then str::from_utf8 would iterate over the bytestring again to verify that it's valid UTF-8. This check is useless because I already know the string only contains ASCII digits. So in this case, I can improve performance with an unsafe block:

  //SAFETY: `bytes` was already proven to only contain ASCII digits
  let string = unsafe { str::from_utf8_unchecked(bytes) };
  let value: u64 = string.parse()?;

The performance gain comes not from unsafe per se, it comes from using a different function that skips the UTF-8 check. Since this function does not guarantee on its own that str's invariants are upheld, it's marked unsafe.

Arnavion · on April 24, 2020

In one of my crates, std::str::from_utf8 is indeed the only reason it uses unsafe.

    let (result, len) = match std::str::from_utf8(buf) {
        Ok(s) => (s, buf.len()),
        Err(err) => match (err.valid_up_to(), err.error_len()) {
            (0, Some(_)) => return Err(Error::Utf8(err)),
            (0, None) => return Err(Error::NeedMoreData),
            (valid_up_to, _) => (
                unsafe { std::str::from_utf8_unchecked(buf.get_unchecked(..valid_up_to)) },
                valid_up_to,
            ),
        },
    };

The intention is to extract as much valid &str from the given buf as possible, and only fail if there is no more valid &str to read from the buf. Unfortunately std::str::from_utf8's error does not also yield a &str corresponding to the part that did parse successfully ( * ), so I have to compute it myself. Using std::str::from_utf8 for this second pass unfortunately does end up verifying the UTF8-ness of the buf again, as verified from the asm, because the compiler isn't sufficiently smart. Similarly slicing buf with valid_up_to safely also reruns the bounds check, which is why the code also uses get_unchecked for that instead.

( * ) Nothing prevents it from doing that, and one of these days I might get around to PRing it.

dirtydroog · on April 24, 2020

Isn't this a bit fragile though? It's totally fine as long as you own and understand the code. But what's to stop someone new and/or inexperienced from coming along and violating the 'already proven' claim? How will you know?

Isn't there a risk of Rust code having "Don't ever change this" comments?

majewsky · on April 27, 2020

That's a general problem of coding, that the next edit can violate an invariant. The advantage with Rust is that, for several types of invariants, you only need to check places with an unsafe block during code review or audits. Everywhere else, the compiler upholds these invariants for you.

lidHanteyk · on April 24, 2020

Why not lift the encoding to the type system? The type system could remember which strings are ASCII and which are UTF-8. More generally, why not lift such proofs to types?

majewsky · on April 27, 2020

That's exactly the difference between [u8] and str. The unsafety here arises from skipping the standard proof over a more domain-specific one.

_pmf_ · on April 24, 2020

> So why is `unsafe` even in the language, then?

Because you cannot do "Hello world" without unsafe code somewhere in the middle.

MaxBarraclough · on April 24, 2020

What? Plenty of languages lack an unsafe mode. Java, for instance.

steveklabnik · on April 24, 2020

The unsafe code is in the JVM, and Rust does not have an equivalent.

(Also, Java had an unsafe mode for a long time, and its removal caused quite the upheaval, from what I understand.)

MaxBarraclough · on April 24, 2020

It's not the case that Rust's unsafe feature is there to facilitate the standard library. I'm still not clear if that's what you were saying.

steveklabnik · on April 24, 2020

If you're trying to print "hello world" on the screen, you need to make a system call. Doing so is unsafe, by Rust's definition.

There are two main solutions to this:

1. have some sort of "unsafe" language construct so that you can make syscalls yourself.

2. Disallow arbitrary syscalls. put the code for making said syscall inside of some trusted code, whether that be the the runtime or the standard library.

Rust has chosen option #1. Java has chosen #2.

(Also, virtually every language has an "unsafe mode" in whatever C FFI exists. I was referring to sun.misc.Unsafe earlier, but JNI still exists, of course.)

MaxBarraclough · on April 24, 2020

Sure, that all sounds about right.

To rephrase my point: including an unsafe mode is an option when designing a language. Your original comment appeared be suggesting that it was a necessity, and that one could not write a Hello World program in a language that lacks an unsafe mode. Which is not the case, of course. JavaScript, for instance.

lidHanteyk · on April 24, 2020

It would seem that a takeaway from this thread is that languages should not have C FFI if they want to be safe.

giovannibajo1 · on April 24, 2020

unsafe lets you exit the small subset of code patterns whose safety the compiler can prove.

Not everything that you write in unsafe is necessarily unsafe: it only means that the compiler can't prove it. Another way of thinking: if unsafe let you only write unsafe code, then it would be useless because that unsafe code would eventually segfault.