It's great to see the 68k backends of GCC and LLVM getting love again.
In many respects the 68000 is the oldest most "retro" architecture for which you can still run a modern compiler suite. Processor introduced in 1979 for which you could compile a Rust binary today. That's pretty amazing.
The ISA is really beautiful (maybe with the exception of BCD additions). I have a m68k disk image/kernel which works with qemu-system-m68k, out of historical sentiment (used to have A500, now I have A4000 - sadly no time to play with it, so in my basement).
As for abi/syscalls. I noticed that strace reports hundreds of calls to get_thread_area() under Linux/m68k. I presume under x86, arm etc. some CPU register is used to store pointer to the TLS, but under m68k "they" decided to invoke syscall every single time. I don't think it'd be very efficient, and also it pollutes strace output, unfortunately, given how frequently TLS is accessed these days.
I have been playing with defining my own toy ABI, and I came up with a trick for thread-local storage:
Allocate the segment for TLS and the thread stack together so that both start (stack grows down, TLS up) at a 2*n - aligned address, with n large enough so that the stack size never exceeds 2*n.
Then, you could always get TLS-1 from SP by setting the lower bits to 1.
To avoid an odd size for the segment, you could put the TLS block at negative offset from the end of the segment.
I had intended mine for machines with 64-bit pointers, but data structures tend to be smaller on 32-bit machines, so they can cope with smaller stacks.
On the M68K Amiga the stack size was fixed and often relatively small: recommended minimum was 16000 bytes, and for large programs recommended size was 80000.
Modern operating systems for CPUs with MMUs grow the stack dynamically but still have to set aside a part of the address space for the maximum it could grow into.
I wouldn't say anything with separate register files for addresses and data is "beautiful", but it's certainly miles better than other microprocessors at the same time like the 65816.
I still remember the time on a BBS that someone talked me out of getting an Apple IIgs and getting an Amiga 500 instead. I wish I remembered his handle.
> Maybe it is all related to errno needing to read the base TLS pointer?
Probably everything that uses __thread requires this. e.g. malloc(), or some functions that return pointers to static buffers, and implementators wanted to make them multi-thread safe.
/*
The interface of this function is completely stupid,
it requires a static buffer. We relax this a bit in
that we allow one buffer for each thread.
*/
static __thread char buffer[18];
char * inet_ntoa (struct in_addr in)
{
unsigned char *bytes = (unsigned char *) ∈
__snprintf (buffer, sizeof (buffer), "%d.%d.%d.%d",
bytes[0], bytes[1], bytes[2], bytes[3]);
return buffer;
}
Well.. I just noticed that implementation of inet_ntoa in glibc is a bit suspect. Using %d with char, without typecasting to (int) or using %hhd is tricky, and depending on memory alignment rules, and on whether arguments are passed in registers on on the stack this can produce incorrect results.
This is because arguments from the stack can be taken as ints (typically in batches of 4 octects), and put there as char (1 octet typically, but alignment might force it to be aligned to 4 bytes, I guess it depends). Interesting.
Maybe that is a form of exploit mitigation? Maybe they can't cache the TLS pointer because the kernel may change it out from under you without warning? I do agree that it smells like a possible performance issue and it seems like there must be a better way to do it.
There are no spare registers available to designate as the thread
register. Therefore, kernel magic is needed to obtain the thread
pointer from userspace.
I guess there was unused 'fs' in intel. There was nothing left in m68k.
This doesn't really answer the question. Even if there isn't a free register the value could be stored on the stack and avoid the syscall overhead. Or maybe I'm misunderstanding what that variable is doing. Or at the very least the value could be calculated once via the kernel and then cached in main memory/a general purpose register.
The value is per thread, so if you're caching it to main memory (including in the stack) then you need one copy per thread and you need to somehow know which one to use based on what thread is running. You'd need to store that index somewhere so there'd be an infinite recursion. Even if it's stored at the base of the current stack you don't want to have to walk all the stack frames to find it.
You can store it in a GPR instead (that's what some modern architectures do) but then it needs to be part of the ABI. Otherwise at the beginning of a function you can't assume any register contains the correct value.
I'm speculating heavily here, but it seems TLS seems to be userland thing, but with OS support (e.g. CLONE_SETTLS flags, *_thread_area()).
Maybe it was thought more like a thread-level thing, and not runtime thing. E.g. if we had some competing runtimes in one process space (e.g. go, jvm, libc), then they can all obtain TLS pointer via some unified method (via register or syscall). Otherwise there would be need for some chosen runtime, which would keep the pointer and distribute it around - which would create another set of problems.
Putting it on the top of initial process stack would work I guess, though.
The AmigaOS ABI was really designed for the m68k, it leverages registers as much as possible. Pass variables in data registers, pointers in address registers. A4 for local data, A6 for the library pointer, results in D0.
That said, most library routines would then store the register contents on the stack and restore on return so not sure it's more efficient for memory access etc.
The m68k is the most CISC of ISAs I've seen, you have plentiful addressing modes and register orthogonality. It had adressing modes to do auto postincrements/predecrements, indirect indexed accesses and all sorts of fun.
The 68k series are rather simple designs by today's standard and should be old enough to have lost patent protection. How hard would it be to reimplement the ISA as a Verilog/VHDL softcore?
I really miss my copy of the m68k programmers reference manual. when I finally had to learn x86, I remember being so frustrated at how fractured it was compared to the beauty that was m68k at the time.
much more blue cover - I think I kept it until about 10 years ago, when I did a purge of physical books that didn't have a fairly deep sentimental attachment. of course, that still ended up with hundreds of pounds of paper.
It would be nice to see the m68k get some 64 bit love to bring it right up to date. I've been considering what such a beast could look like compared to the cruddy x86 64 bit ISA.
The moment you start doing that you are going to lose backwards compatibility anyways, so may as well go back to the drawing board and revisit other compromises made in the ISA (like separate data and address registers, for example. Or big endian instead of little endian) and you end up with something else, IMHO.
Maybe something closer to a 64-bit version of the NS32k ISA.
Yes, the instruction set is laid out rather beautifully, this is something I'd really like to see to implemented in 64 bits with support for true SMP and see how well that performs against other architectures.
It's a nice CISC, but the two separate register files are something you'd be silly to carry over to 64-bit.
I think the lesson from x86 is that it's entirely possible to make a legacy CISC ISA perform really well with enough engineering. But the question is: why do that for an extinct arch? Only 32-bits, but ColdFire attempted to revive the ISA and that didn't work out, either.
In many respects the 68000 is the oldest most "retro" architecture for which you can still run a modern compiler suite. Processor introduced in 1979 for which you could compile a Rust binary today. That's pretty amazing.