This reminds me of some playing around I did with x86 instruction decoding. I've...

kenanb · on Jan 8, 2024

Wow that's a very interesting summary of the encoding of opcode itself. Thanks!

I find this part the most challenging. It is relatively easy to figure out the "mapping between the assembly and instruction" when you have both in front of you already, as I did in my posts.

But I would have difficulty translating from one to the other, because the opcode encoding is difficult. You can actually see me intentionally handwaving it in an earlier post: https://blog.kenanb.com/code/low-level/2024/01/04/x86-insn-e...

Note for other people reading both my post and the comment above:

The terms r0 and r3 in the last paragraph corresponds to what I describe in my post as: The "register code" or "opcode extension" values stored in the "ModR/M.reg" field. In this case, the values 0 and 3 are meant to be "opcode extensions". You can see both instructions here: http://ref.x86asm.net/coder64.html#x0F01 . The values 0 (for SGDT) and 3 (for LIDT) are shown in the column called "o", which is defined as: "Register/ Opcode Field"

moonchild · on Jan 8, 2024

It could definitely be clearer, but what makes you say the manual is misleading?

jcranmer · on Jan 9, 2024

The two main things:

* F0 and F2 are not mutually exclusive; there are few instructions that use both prefixes simultaneously.

* It's a little bit more helpful, IMHO, to think of the prefixes not as prefixes but as extra bits to the opcode, so 0x66 0x00 and 0x00 are different instructions that just happen to both be ADD of different sizes.

In general, the section on "legacy" prefixes seems to have largely been untouched since largely the 32-bit days, and so it's generally written as if it's mostly talking about 8086-style instructions whereas the actual implementation (especially when you start getting to the SSE instructions) has diverged somewhat from the original meanings of those prefixes, instead just stealing them for extra opcode bits.