Some race conditions are intentional at the expense of knowing that the data wou...

dekhn · on Oct 3, 2019

Hmm. I thought data races due to concurrent thread access are actually undefined behavior (up to and included termination of the program).

schoen · on Oct 3, 2019

Are you thinking of a particular programming language specification here? It seems like one programming language might define this this way, while another might simply say that the order of the events is not guaranteed (and may not be the one that the programmer expected), but not that the events themselves may do something other than the normal language semantics provide.

jcranmer · on Oct 3, 2019

C++ and C both explicitly say that data races are undefined behavior, not unspecified behavior.

Data races are one of those cases where "anything may go" undefined behavior is actually quite necessary. Actual hardware memory models (particularly on "weak" architectures like ARM or PowerPC, or especially the infamously weak Alpha) result in cases where the order of memory traffic is not a consistent view between different threads. Formally specifying the hardware memory model is actually surprisingly difficult, made all the more difficult when you want to give a hardware-agnostic definition [1]. And should you want to introduce hardware transactional memory, break down in despair at the complexity [2].

The great breakthrough in memory models is the definition of the "data-race-free" model. This model says that, if memory accesses are properly protected with locks (i.e., there are no "data races" [3]), you cannot observe a violation of sequential consistency, which makes reasoning much easier. Java used this model for its corrected memory model in Java 5, and then C++11 adapted Java's memory model, and C11 took C++11's model verbatim. The upshot of this approach is that, if you make data races undefined behavior, you don't have to try to figure out how typical memory optimizations, both by the compiler and by the physical hardware, affect visible memory. Trying to work out the effects of these optimizations on the memory system is extraordinarily difficult, and the benefits of doing so are manifestly unclear, since most programmers will need to properly synchronize their code anyways.

[1] The C++ memory model attempts to do this with atomic memory references. The resulting specification (in C++11) is generally considered to be wrong, and the question of how to fix it is still up in the air as of right now.

[2] PLDI 2018 had the first formal semantics that combined transactional memory and weak memory semantics, and went on to prove that ARM's hardware lock elision was broken. This should be showing just how difficult this is to grasp even for theoreticians pushing the boundary, let alone a language trying to make it accessible to typical programmers.

[3] There are two definitions of "data race" going on here. Vernacular definitions usually define it as accesses not protected by synchronization, which permit "benign" data races for regular loads/stores. C/C++ tweaks the definition so as to use it to proscribe undefined behavior, so "benign" data race is oxymoronic in those languages. However, the use of memory_order_relaxed is intended to indicate a permitted data race in the first sense but not the second sense.

schoen · on Oct 3, 2019

Thanks for the tremendous detail on this point.

Could you or someone else please give a minimal example of an undefined access in C, just so that I can be sure of whether we're talking about the same phenomena here?

jcranmer · on Oct 3, 2019

A minimal program (using C11 threads):

   #include <threads.h>
   int x;

   int do_thread(void *) {
     for (int i = 0; i < 10000; i++)
       x = i;
     return 0;
   }
   int main() {
     thrd_t t;
     thrd_create(&t, do_thread, NULL);
     do_thread(NULL);
     thrd_join(&t, NULL);
     return 0;
   }

There is an undefined data race on x, since it is simultaneously accessed from two threads without any synchronization. Replace the declaration of x with `_Atomic int x;`, and the resulting code is well-defined.

dekhn · on Oct 4, 2019

thanks, this comment (point 3 in particular) did a great service in clearing up the disconnect.

dekhn · on Oct 4, 2019

no programming languages, just the hardware. I think the only way to reasonably think about data races is at the hardware instruction/memory level, not the programming language.

gpderetta · on Oct 3, 2019

they very much are.

shaklee3 · on Oct 3, 2019

Can you point to a source on that? If you have two pthreads in C accessing the same variable, they will definitely not necessarily see the same load/store order, but it's perfectly valid as a form of lazy evaluation to assume the value eventually can be seen.

gpderetta · on Oct 3, 2019

Can't check chapter and verse on the standard right now. Will you take cppreference[1] as authoritative? See the section on data races. The C11 memory model only differ minimally form C++11.

[1] https://en.cppreference.com/w/cpp/language/memory_model

shaklee3 · on Oct 3, 2019

I'm aware of c++ memory models, but this is exactly what I was talking about. The actual behavior is defined by the hardware, in the programming language is an abstraction on top of that. For certain sizes, you will not get a load or a store that is only partially there. So on x64, you can safely share data between threads knowing that the observed order of that data is completely undefined. But in cases of looking into a view of statistics, that may be perfectly acceptable given the trade-offs.

jcranmer · on Oct 4, 2019

> The actual behavior is defined by the hardware, in the programming language is an abstraction on top of that.

This is a very dangerous way to understand semantics of programming languages. Programming languages are usually defined via some sort of operational semantics on an abstract machine which occasionally bears some resemblance to how processor semantics are defined. But sometimes there is a massive disconnect between processor semantics and language semantics--and perhaps nowhere is that disconnect greater than memory.

In hardware, there is a firm distinction between memory and registers. Any language which approximates hardware semantics (which includes any mid-level or low-level compiler IR) is going to mirror this distinction with the notion of a potentially infinite register set (if pre-register allocation) and use of explicit instructions to move values to and from memory. But in every high-level language, there is no concept of a memory/register distinction: just values floating around in a giant bag of them, often no notion of loads or stores. You can approximate this by annotating values (as C does, with the volatile specifier), but the resulting semantics end up being muddy because you're relying on an imperfect mapping to get to the intended semantics (don't delete this load/store). Optimization that screws up the mapping of the language memory model to the hardware memory model is anticipated, even 30 years ago--that's why C has a volatile keyword.

It is possible, if you are very careful, to write C code that exposes the underlying hardware model that allows you to reason about your program using that model. But this is not typical C code, and instead requires modifying your code in such a way that you make the intended loads and stores explicit. And if you write your code in the typical way and expect it to map cleanly to hardware semantics, you're going to have a bad time.

shaklee3 · on Oct 4, 2019

Coming from a world where latency and performance is paramount, I think it's perfectly acceptable to write code targeted at a specific architecture you know won't change. I understand your points though, and it's definitely not a style you should use in 99% of programming.

IggleSniggle · on Oct 3, 2019

I would call this sampling that could turn into a race condition if it stops being treated as sampling.