Motorola 68k Application Binary Interface (ABI)

cmrdporcupine · on Jan 20, 2022

It's great to see the 68k backends of GCC and LLVM getting love again.

In many respects the 68000 is the oldest most "retro" architecture for which you can still run a modern compiler suite. Processor introduced in 1979 for which you could compile a Rust binary today. That's pretty amazing.

jagrsw · on Jan 20, 2022

The ISA is really beautiful (maybe with the exception of BCD additions). I have a m68k disk image/kernel which works with qemu-system-m68k, out of historical sentiment (used to have A500, now I have A4000 - sadly no time to play with it, so in my basement).

As for abi/syscalls. I noticed that strace reports hundreds of calls to get_thread_area() under Linux/m68k. I presume under x86, arm etc. some CPU register is used to store pointer to the TLS, but under m68k "they" decided to invoke syscall every single time. I don't think it'd be very efficient, and also it pollutes strace output, unfortunately, given how frequently TLS is accessed these days.

  $ uname -a
  Linux ds-m68k 5.16.0-rc1-g02e0dceb9e3b #7 Fri Jan 14 01:27:22 CET 2022 m68k GNU/Linux
  $ strace -c /bin/ls
  % time     seconds  usecs/call     calls    errors syscall
  ---- ----------- ----------- --------- --------- ----------------
  0.00    0.000000           0         6           read
  0.00    0.000000           0         1           write
  0.00    0.000000           0        10           close
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         2         2 access
  0.00    0.000000           0         3           brk
  0.00    0.000000           0         3           ioctl
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         2         2 statfs
  0.00    0.000000           0         1           uname
  0.00    0.000000           0        10           mprotect
  0.00    0.000000           0        14           mmap2
  0.00    0.000000           0         2           getdents64
  0.00    0.000000           0         8           openat
  0.00    0.000000           0       285           get_thread_area
  0.00    0.000000           0         1           set_thread_area
  0.00    0.000000           0         9           statx
  ---- ----------- ----------- --------- --------- ----------------
  100.00    0.000000           0       359         4 total

Findecanor · on Jan 20, 2022

I have been playing with defining my own toy ABI, and I came up with a trick for thread-local storage:

Allocate the segment for TLS and the thread stack together so that both start (stack grows down, TLS up) at a 2*n - aligned address, with n large enough so that the stack size never exceeds 2*n. Then, you could always get TLS-1 from SP by setting the lower bits to 1.

To avoid an odd size for the segment, you could put the TLS block at negative offset from the end of the segment.

I had intended mine for machines with 64-bit pointers, but data structures tend to be smaller on 32-bit machines, so they can cope with smaller stacks. On the M68K Amiga the stack size was fixed and often relatively small: recommended minimum was 16000 bytes, and for large programs recommended size was 80000. Modern operating systems for CPUs with MMUs grow the stack dynamically but still have to set aside a part of the address space for the maximum it could grow into.

gpderetta · on Jan 21, 2022

IIRC that's how TLS used to work in LinuxThreads.

pcwalton · on Jan 20, 2022

I wouldn't say anything with separate register files for addresses and data is "beautiful", but it's certainly miles better than other microprocessors at the same time like the 65816.

icedchai · on Jan 21, 2022

I still remember the time on a BBS that someone talked me out of getting an Apple IIgs and getting an Amiga 500 instead. I wish I remembered his handle.

windenntw · on Jan 20, 2022

Could you post a bit more info about that binary? (eg: run ldd to dump the used libs, etc)

Could you get some kind of stack trace at the time of the get_thread_area call?

I'd like to be able to pinpoint the call(s) somewhere in

https://codesearch.debian.net/search?q=get_thread_area&perpk...

jagrsw · on Jan 20, 2022

  # ldd /bin/ls
 libselinux.so.1 => /lib/m68k-linux-gnu/libselinux.so.1 (0xc0028000)
 libc.so.6 => /lib/m68k-linux-gnu/libc.so.6 (0xc004e000)
 /lib/ld.so.1 (0xc0000000)
 libpcre2-8.so.0 => /lib/m68k-linux-gnu/libpcre2-8.so.0 (0xc019e000)
 libdl.so.2 => /lib/m68k-linux-gnu/libdl.so.2 (0xc01f9000)

Both from ld and from libc. HTH

  # strace -o /dev/fd/1 -e get_thread_area -i /bin/ls | cut -f2 -d"[" | cut -f1 -d"]" | sort | uniq
  ????????
  c0018330
  c007313c

  # gdb /bin/ls
  (gdb) break *0xc0018330
  (gdb) r
  Breakpoint 1, __m68k_read_tp () at ../sysdeps/unix/sysv/linux/m68k/m68k- helpers.c:24
  (gdb) bt
  #0  __m68k_read_tp () at ../sysdeps/unix/sysv/linux/m68k/m68k-helpers.c:24
  #1  0xc0002e76 in init_tls (naudit=0) at rtld.c:808
  #2  0xc0005318 in dl_main (phdr=<optimized out>, phnum=<optimized out>, user_entry=<optimized out>, auxv=<optimized out>)
    at rtld.c:2030
  #3  0xc00152c0 in _dl_sysdep_start (start_argptr=0xeffffd20, dl_main=0xc0003704 <dl_main>) at ../elf/dl-sysdep.c:250
  #4  0xc0003194 in _dl_start_final (arg=0xeffffd20, info=0xeffffabe) at rtld.c:489
  #5  0xc000348e in _dl_start (arg=0xeffffd20) at rtld.c:584
  #6  0xc000289c in _start () from /lib/ld.so.1


  (gdb) continue 
  Breakpoint 1, __m68k_read_tp () at ../sysdeps/unix/sysv/linux/m68k/m68k-helpers.c:24
  24 in ../sysdeps/unix/sysv/linux/m68k/m68k-helpers.c
  (gdb) bt
  #0  __m68k_read_tp () at ../sysdeps/unix/sysv/linux/m68k/m68k-helpers.c:24
  #1  0xc000d83e in _dl_fixup (save_a0=-1073325311, save_a1=-1072331548, l=0xc0023a90, reloc_arg=96) at dl-runtime.c:99
  #2  0xc00132ca in _dl_runtime_resolve () at ../sysdeps/m68k/dl-trampoline.S:43

  (gdb) delete breakpoints 
  (gdb) break *0xc007313c
  (gdb) continue 
  Breakpoint 2, __m68k_read_tp () at ../sysdeps/unix/sysv/linux/m68k/m68k-helpers.c:24
  24 ../sysdeps/unix/sysv/linux/m68k/m68k-helpers.c: No such file or directory.
  (gdb) bt
  #0  __m68k_read_tp () at ../sysdeps/unix/sysv/linux/m68k/m68k-helpers.c:24
  #1  0xc0110ce4 in statfs () at ../sysdeps/unix/syscall-template.S:123
  #2  0xc002ffd8 in ?? () from /lib/m68k-linux-gnu/libselinux.so.1
  #3  0xc0047489 in ?? () from /lib/m68k-linux-gnu/libselinux.so.1
  #4  0xeffffbe0 in ?? ()
  #5  0xc004b328 in ?? () from /lib/m68k-linux-gnu/libselinux.so.1
  #6  0x00000001 in ?? ()
  #7  0x8001e31c in stderr ()
  #8  0xc004b480 in ?? () from /lib/m68k-linux-gnu/libselinux.so.1
  #9  0x00000000 in ?? ()

  (gdb) continue 
  Breakpoint 2, __m68k_read_tp () at ../sysdeps/unix/sysv/linux/m68k/m68k-helpers.c:24
  24 in ../sysdeps/unix/sysv/linux/m68k/m68k-helpers.c
  (gdb) bt
  #0  __m68k_read_tp () at ../sysdeps/unix/sysv/linux/m68k/m68k-helpers.c:24
  #1  0xc0073408 in __GI___errno_location () at errno-loc.c:26
  #2  0xc002ffe4 in ?? () from /lib/m68k-linux-gnu/libselinux.so.1
  #3  0xc004b328 in ?? () from /lib/m68k-linux-gnu/libselinux.so.1
  #4  0x00000001 in ?? ()
  #5  0x8001e31c in stderr ()
  #6  0xc004b480 in ?? () from /lib/m68k-linux-gnu/libselinux.so.1
  #7  0x00000000 in ?? ()

  (gdb) continue 
  Breakpoint 2, __m68k_read_tp () at 
  ../sysdeps/unix/sysv/linux/m68k/m68k-helpers.c:24
  24 in ../sysdeps/unix/sysv/linux/m68k/m68k-helpers.c
  (gdb) bt
  #0  __m68k_read_tp () at ../sysdeps/unix/sysv/linux/m68k/m68k-helpers.c:24
  #1  0xc00c1528 in ptmalloc_init () at arena.c:358
  #2  0xc00c4786 in ptmalloc_init () at arena.c:326
  #3  malloc_hook_ini (sz=304, caller=0xc00b00ec <__fopen_internal+22>) at hooks.c:31
  #4  0xc00c461c in __GI___libc_malloc (bytes=304) at malloc.c:3203
  #5  0xc00b00ec in __fopen_internal (filename=0xc004746d "/proc/filesystems", mode=0xc0047562 "re", is32=1) at iofopen.c:58
  #6  0xc00b01c2 in _IO_new_fopen (filename=0xc004746d "/proc/filesystems", mode=0xc0047562 "re") at iofopen.c:86
  #7  0xc0036c06 in selinuxfs_exists () from /lib/m68k-linux-gnu/libselinux.so.1


  (gdb) disassemble __m68k_read_tp
  Dump of assembler code for function __m68k_read_tp:
     0xc0018328 <+0>: movel #333,%d0
     0xc001832e <+6>: trap #0
  => 0xc0018330 <+8>: moveal %d0,%a0
     0xc0018332 <+10>: rts


  (gdb) info proc mappings 
  process 336
  Mapped address spaces:

 Start Addr   End Addr       Size     Offset objfile
 0x80000000 0x8001b000    0x1b000        0x0 /usr/bin/ls
 0x8001d000 0x8001e000     0x1000    0x1b000 /usr/bin/ls
 0x8001e000 0x8001f000     0x1000    0x1c000 /usr/bin/ls
 0x8001f000 0x80020000     0x1000        0x0 [heap]
 0xc0000000 0xc0020000    0x20000        0x0 /usr/lib/m68k-linux-gnu/ld-2.33.so
 0xc0021000 0xc0022000     0x1000    0x1f000 /usr/lib/m68k-linux-gnu/ld-2.33.so
 0xc0022000 0xc0024000     0x2000    0x20000 /usr/lib/m68k-linux-gnu/ld-2.33.so
 0xc0028000 0xc0049000    0x21000        0x0 /usr/lib/m68k-linux-gnu/libselinux.so.1
 0xc0049000 0xc004b000     0x2000    0x21000 /usr/lib/m68k-linux-gnu/libselinux.so.1
 0xc004b000 0xc004c000     0x1000    0x21000 /usr/lib/m68k-linux-gnu/libselinux.so.1
 0xc004c000 0xc004d000     0x1000    0x22000 /usr/lib/m68k-linux-gnu/libselinux.so.1
 0xc004d000 0xc004e000     0x1000        0x0 
 0xc004e000 0xc018e000   0x140000        0x0 /usr/lib/m68k-linux-gnu/libc-2.33.so
 0xc018e000 0xc0190000     0x2000   0x140000 /usr/lib/m68k-linux-gnu/libc-2.33.so
 0xc0190000 0xc0192000     0x2000   0x140000 /usr/lib/m68k-linux-gnu/libc-2.33.so
 0xc0192000 0xc0196000     0x4000   0x142000 /usr/lib/m68k-linux-gnu/libc-2.33.so
 0xc0196000 0xc019e000     0x8000        0x0 
 0xc019e000 0xc01f3000    0x55000        0x0 /usr/lib/m68k-linux-gnu/libpcre2-8.so.0.10.4
 0xc01f3000 0xc01f5000     0x2000    0x55000 /usr/lib/m68k-linux-gnu/libpcre2-8.so.0.10.4
 0xc01f5000 0xc01f6000     0x1000    0x55000 /usr/lib/m68k-linux-gnu/libpcre2-8.so.0.10.4
 0xc01f6000 0xc01f7000     0x1000    0x56000 /usr/lib/m68k-linux-gnu/libpcre2-8.so.0.10.4
 0xc01f7000 0xc01f9000     0x2000        0x0 
 0xc01f9000 0xc01fb000     0x2000        0x0 /usr/lib/m68k-linux-gnu/libdl-2.33.so
 0xc01fb000 0xc01fe000     0x3000     0x2000 /usr/lib/m68k-linux-gnu/libdl-2.33.so
 0xc01fe000 0xc01ff000     0x1000     0x3000 /usr/lib/m68k-linux-gnu/libdl-2.33.so
 0xc01ff000 0xc0200000     0x1000     0x4000 /usr/lib/m68k-linux-gnu/libdl-2.33.so
 0xeffdf000 0xf0000000    0x21000        0x0 [stack]

windenntw · on Jan 20, 2022

excellent, thanks!

so the traced calls would more or less be:

https://sources.debian.org/src/glibc/2.33-3/sysdeps/m68k/tls...

https://sources.debian.org/src/glibc/2.33-3/sysdeps/unix/sys...

Maybe it is all related to errno needing to read the base TLS pointer? https://sources.debian.org/src/linux/5.15.15-1/arch/m68k/ker...

Sounds like we'd benefit from placing the result of calling sys_get_thread_area(void) in the VDSO area and updating it from http://lxr.linux.no/linux+v5.14/arch/m68k/include/asm/entry.... every time we do a process switch

jagrsw · on Jan 20, 2022

> Maybe it is all related to errno needing to read the base TLS pointer?

Probably everything that uses __thread requires this. e.g. malloc(), or some functions that return pointers to static buffers, and implementators wanted to make them multi-thread safe.

  /*
     The interface of this function is completely stupid,
     it requires a static buffer.  We relax this a bit in 
     that we allow one buffer for each thread.
  */
  static __thread char buffer[18];

  char * inet_ntoa (struct in_addr in)
  {
    unsigned char *bytes = (unsigned char *) &in;
    __snprintf (buffer, sizeof (buffer), "%d.%d.%d.%d",
                bytes[0], bytes[1], bytes[2], bytes[3]);
    return buffer;
  }

jagrsw · on Jan 20, 2022

Well.. I just noticed that implementation of inet_ntoa in glibc is a bit suspect. Using %d with char, without typecasting to (int) or using %hhd is tricky, and depending on memory alignment rules, and on whether arguments are passed in registers on on the stack this can produce incorrect results.

This is because arguments from the stack can be taken as ints (typically in batches of 4 octects), and put there as char (1 octet typically, but alignment might force it to be aligned to 4 bytes, I guess it depends). Interesting.

Taniwha · on Jan 20, 2022

errno and malloc() not that surpsising

jandrese · on Jan 20, 2022

Maybe that is a form of exploit mitigation? Maybe they can't cache the TLS pointer because the kernel may change it out from under you without warning? I do agree that it smells like a possible performance issue and it seems like there must be a better way to do it.

jagrsw · on Jan 20, 2022

From https://lists.debian.org/debian-68k/2007/11/msg00071.html (Draft TLS/NPTL ABI for m68k and ColdFire, version 0.2)

  There are no spare registers available to designate as the thread
  register.  Therefore, kernel magic is needed to obtain the thread
  pointer from userspace.

I guess there was unused 'fs' in intel. There was nothing left in m68k.

jandrese · on Jan 20, 2022

This doesn't really answer the question. Even if there isn't a free register the value could be stored on the stack and avoid the syscall overhead. Or maybe I'm misunderstanding what that variable is doing. Or at the very least the value could be calculated once via the kernel and then cached in main memory/a general purpose register.

johntb86 · on Jan 20, 2022

The value is per thread, so if you're caching it to main memory (including in the stack) then you need one copy per thread and you need to somehow know which one to use based on what thread is running. You'd need to store that index somewhere so there'd be an infinite recursion. Even if it's stored at the base of the current stack you don't want to have to walk all the stack frames to find it.

You can store it in a GPR instead (that's what some modern architectures do) but then it needs to be part of the ABI. Otherwise at the beginning of a function you can't assume any register contains the correct value.

jagrsw · on Jan 20, 2022

I'm speculating heavily here, but it seems TLS seems to be userland thing, but with OS support (e.g. CLONE_SETTLS flags, *_thread_area()).

Maybe it was thought more like a thread-level thing, and not runtime thing. E.g. if we had some competing runtimes in one process space (e.g. go, jvm, libc), then they can all obtain TLS pointer via some unified method (via register or syscall). Otherwise there would be need for some chosen runtime, which would keep the pointer and distribute it around - which would create another set of problems.

Putting it on the top of initial process stack would work I guess, though.

herio · on Jan 20, 2022

The AmigaOS ABI was really designed for the m68k, it leverages registers as much as possible. Pass variables in data registers, pointers in address registers. A4 for local data, A6 for the library pointer, results in D0.

That said, most library routines would then store the register contents on the stack and restore on return so not sure it's more efficient for memory access etc.

The m68k is the most CISC of ISAs I've seen, you have plentiful addressing modes and register orthogonality. It had adressing modes to do auto postincrements/predecrements, indirect indexed accesses and all sorts of fun.

gmueckl · on Jan 20, 2022

The 68k series are rather simple designs by today's standard and should be old enough to have lost patent protection. How hard would it be to reimplement the ISA as a Verilog/VHDL softcore?

gpderetta · on Jan 20, 2022

It has been done. Search for the Apollo 68080 for example.

edit: also relevant is the the PiStorm, a software emulated 680x0 running on a RaspberryPi that plugs into a 680x0 socket.

edit2: the MiST/MiSTer are also well known 680x0 open source FPGA implementations.

jerrysievert · on Jan 21, 2022

I really miss my copy of the m68k programmers reference manual. when I finally had to learn x86, I remember being so frustrated at how fractured it was compared to the beauty that was m68k at the time.

mshockwave · on Jan 21, 2022

I think you're talking about this one? https://m680x0.github.io/doc/official-docs

jerrysievert · on Jan 21, 2022

I was actually talking about this one (sorry about the amazon link): https://www.amazon.com/M68000-16-bit-microprocessor-Programm...

much more blue cover - I think I kept it until about 10 years ago, when I did a purge of physical books that didn't have a fairly deep sentimental attachment. of course, that still ended up with hundreds of pounds of paper.

mrlonglong · on Jan 20, 2022

It would be nice to see the m68k get some 64 bit love to bring it right up to date. I've been considering what such a beast could look like compared to the cruddy x86 64 bit ISA.

cmrdporcupine · on Jan 20, 2022

The moment you start doing that you are going to lose backwards compatibility anyways, so may as well go back to the drawing board and revisit other compromises made in the ISA (like separate data and address registers, for example. Or big endian instead of little endian) and you end up with something else, IMHO.

Maybe something closer to a 64-bit version of the NS32k ISA.

KerrAvon · on Jan 21, 2022

Those changes would lose the character of the M68k, though.

mrlonglong · on Jan 21, 2022

Yes, the instruction set is laid out rather beautifully, this is something I'd really like to see to implemented in 64 bits with support for true SMP and see how well that performs against other architectures.

cmrdporcupine · on Jan 21, 2022

It's a nice CISC, but the two separate register files are something you'd be silly to carry over to 64-bit.

I think the lesson from x86 is that it's entirely possible to make a legacy CISC ISA perform really well with enough engineering. But the question is: why do that for an extinct arch? Only 32-bits, but ColdFire attempted to revive the ISA and that didn't work out, either.

mrlonglong · on Jan 23, 2022

I think we would apply the lessons learned since the 90s and see where that gets us.