"->" is taken from the IBM PL/I language, which was an important source of various syntax elements of C (the other 2 main sources being BCPL and Algol 68).
It was added because the indirection operator "*" has the wrong position in C, it is prefix instead of being postfix, like in the languages where it had first appeared, i.e. Euler and Pascal, which had inherited it from Euler (both are Wirth languages).
Had "*" been postfix, ".*" could have always been used instead of "->". As it is, "->" eliminates a pair of parentheses that is needed when the indirection is followed by other postfix operators, like array indexing or structure member selection, which happens very frequently.
IIRC, in one of his papers about the history of C, Dennis Ritchie has recognized several mistakes in the design of C and one of them was the prefix "*", besides others like the precedence of "&" and "|".
In PL/I "->" was the only indirection operator, while CPL had no explicit indirection, the indirection was implicit whenever a pointer was used in an expression, like with the C++ references. The CPL pointers (which were named references, "pointer" is a term introduced later by PL/I) could use a special assignment operator, which assigned the address of an expression, not its value.
While "->" is somewhat redundant, its sole purpose being to eliminate some parentheses, it is no more redundant than e.g. the existence of both "for" and "while" or of "do ... while" when "break" exists, or of the comma operator, or of several other redundant features of C, which offer alternative ways of writing the same thing, with slightly shorter variants for certain use cases. While C has much less redundancy than Perl (which is proud of it), it still has much more than I consider good for a programming language.
When structures had been added to C, they had realized that the wrong position of "*" requires extra parentheses, but they must have had plenty of code written in late B and early C that had used "*", so instead of redefining "*" they have chosen the easier path of taking the structures from PL/I together with all their syntax elements used in PL/I, i.e. the keyword "struct" and the operators "." and "->".
Hey array indexing is postfix, so let's just use the old "arrays decay to pointers" staple, but obviously in reverse:
typedef struct {
int a;
int b;
} foo;
int main(void) {
foo one = { 1, 2 };
foo * const p = &one;
p[0].b = 13; // Look ma no -> !
printf("Now we have { %d, %d }\n", one.a, one.b);
return 0;
}
Of course one could go all nasty and flip the base and index:
0[p].b = 13;
But hey that's just being silly.
In all seriousness, thanks a lot for the historic review of C syntax! Very interesting, as someone who really likes C enough to (ab)use it whenever possible.
I know that a->b is equivalent to (*a).b, but still the difference makes sense to my brain which first learned assembly. It's much "lighter-feeling", not having to evaluate the "entire" structure just to get to a single field. :) Weird, but I really like this little piece of syntactic sugar. Yum.
Yes, I believe there was a variant where struct members had global identifier scope. This means that in order to compile x->mem or x.mem, you don't need to know the type of x at all: mem alone tells you the offset to use. In this context, the distinction between -> and . matters because you don't have the type to tell whether an indirection is needed.
It's surprising that unified -> and . wasn't widely added as a compiler extension. Most compilers already do the adjustment as part of error recovery, and most of them accept much more dubious type errors by default (things that never were part of any C standard). GDB handles it just fine, though.
Personally I like the distinction between them as it gives me context for the object being a pointer, which helps with "at-a-glace" reading of clean code.
I discovered long ago that the -> . dichotomy makes it essentially impossible to refactor code by switching a type to/from a value/ref type. It is a major factor in C being a very difficult language to refactor code in. Long lived C code still retains its original design mistakes because of that.