More

kruador · 2025-12-08T09:56:52 1765187812

The 8086 was a stop-gap solution until iAPX432 was ready.

The 80286 was a stop-gap solution until iAPX432 was ready.

The 80386 started as a stop-gap solution until iAPX432 was ready, until someone higher up finally decided to kill that one.

pjc50 · 2025-12-08T10:19:49 1765189189

https://en.wikipedia.org/wiki/Intel_iAPX_432

I'd never heard of it myself, and reading that Wikipedia page it seems to have been a collection of every possible technology that didn't pan out in IC-language-OS codesign.

Meanwhile, in Britain a few years later in 1985, a small company and a dedicated engineer, Sophie Wilson, decided that what they needed was a RISC processor that was as plain and straightforward as possible ...

kruador · 2025-12-01T17:51:26 1764611486

ARM64 also has fixed length 32-bit instructions. Yes, immediates are normally small and it's not particularly orthogonal as to how many bits are available.

The largest MOV available is 16 bits, but those 16 bits can be shifted by 0, 16, 32 or 48 bits, so the worst case for a 64-bit immediate is 4 instructions. Or the compiler can decide to put the data in a PC-relative pool and use ADR or ADRP to calculate the address.

ADD immediate is 12 bits but can optionally apply a 12-bit left-shift to that immediate, so for immediates up to 24 bits it can be done in two instructions.

ARM64 decoding is also pretty complex, far less orthogonal than ARM32. Then again, ARM32 was designed to be decodable on a chip with 25,000 transistors, not where you can spend thousands of transistors to decode a single instruction.

kruador · 2025-12-01T17:27:47 1764610067

ARM64 assembly has a MOV instruction, but for most of the ways it's used, it's an alias in the assembler to something else. For example, MOV between two registers actually generates ORR rd, rZR, rm, i.e. rd := (zero-register) OR rm. Or, a MOV with a small immediate is ORR rd, rZR, #imm.

If trying to set the stack pointer, or copy the stack pointer, instead the underlying instruction is ADD SP, Xn, #0 i.e. SP = Xn + 0. This is because the stack pointer and zero register are both encoded as register 31 (11111). Some instructions allow you to use the zero register, others the stack pointer. Presumably ORR uses the zero register and ADD the stack pointer.

NOP maps to HINT #0. There are 128 HINT values available; anything not implemented on this processor executes as a NOP.

There are other operations that are aliased like CMP Xm, Xn is really an alias for SUBS XZR, Xm, Xn: subtract Xn from Xm, store the result in the zero register [i.e. discard it], and set the flags. RISC-V doesn't have flags, of course. ARM Ltd clearly considered them still useful.

There are other oddities, things like 'rotate right' is encoded as 'extract register from pair of registers', but it specifies the same source register twice.

Disassemblers do their best to hide this from you. ARM list a 'preferred decoding' for any instruction that has aliases, to map back to a more meaningful alias wherever possible.

kruador · 2025-09-25T09:04:22 1758791062

Toyota's hybrids, at least, have valves in the hydraulic system. If everything is working, the driver's pedal is isolated from the physical pistons. Pressing the pedal instead moves a 'stroke simulator' (a cylinder with a spring in it), and the pressure is measured with a transducer. The Brake ECU tries to satisfy as much braking demand through regenerative braking as possible, applying the rear brakes to keep balance and front brakes if you brake too hard, requesting more braking than can be generated or the battery can absorb.

If there's a failure of the electrical supply to the brake ECU, or another fault condition occurs, various valves then revert to their normally-open or normally-closed positions to allow hydraulic pressure from the pedal through to the brake cylinders, and isolate the stroke simulator.

Because the engine isn't constantly running and providing a vacuum that can be used to assist with brake force, the system also includes a 'brake accumulator' and pump to boost the brake pressure.

Reference: https://pmmonline.co.uk/technical/blue-prints-insight-into-t...

I don't know for certain, but I would assume that other hybrids and EVs have similar systems to maximise regenerative braking.

kruador · 2025-09-23T11:19:00 1758626340

It's a pain in the backside to run on Windows, for two reasons. Firstly, Windows doesn't have (by default) a lot of the tools that are preinstalled in most nix environments. Git for Windows ships half a Cygwin distribution (MSYS2) including Bash, Perl, and Tcl.

Second, Windows doesn't really have a 'fork' API. Creating a new process on Windows is a heavyweight operation compared to nix. As such, scripts that repeatedly invoke other commands are sluggish. Converting them to C and calling plumbing commands in-process has a radical effect on performance.

Git for Windows is more of a maintained fork than a real first-class platform.

Also, I believe it's a goal to make it possible to use Git as a library rather than as an executable. That's hard to do if half the logic is in a random scripting language. Library implementations exist - notably libgit2 - but it can never be fully up to date with the original. Search for 'git libification'.

Many IDEs started their Git integration with libgit2, but subsequently fell foul of things that libgit2 can't do or does inconsistently. Therefore they fall back on executing `git` with some fixed-format output.

1718627440 · 2025-09-24T09:52:29 1758707549

I don't get why everything needs to be a library? Using the OS to invoke things gets you parallelism and isolation for free. When you need to deal with complicated combination of parameters to an API, it doesn't become too different from argument parsing, so you might as well do that instead.

You can still wrap the interface to the executable in a library.

kruador · 2025-09-15T11:57:56 1757937476

The text that the footnote is attached to is:

"Large Language Models can gall on an aesthetic level because they are IMPish slurries of thought itself, every word ever written dried into weights and vectors and lubricated with the margarine of RLHF." I infer 'IMPish' as meaning 'like Instant Mashed Potato'.

I read that footnote as a somewhat oblique criticism of two LLMs, rather than on the statistic itself - which may indeed have just been fabricated by the LLM as opposed to an actual statistic somehow dredged from its training data, or pulled from a web search.

kruador · 2025-09-15T11:41:22 1757936482

Strictly, UK teaspoons are 5 ml and tablespoons 15 ml. The metric tablespoons already used in Europe were probably close enough to half an Imperial fluid ounce for it not to matter for most purposes.

My kids' baby bottles were labelled with measurements in metric (30 ml increments) and in both US and Imperial fluid ounces. The cans of formula were supplied with scoops for measuring the powder, which were also somewhere close to 2 tablespoons/one fluid ounce (use one scoop per 30 ml of water). There are dire warnings about not varying the concentration from the recommended amount, but I assume that it's not really that precise within 1-2% - more about not varying by 10-20%. My kids seem to have survived, anyway.

inferiorhuman · 2025-09-15T12:45:50 1757940350

  Strictly, UK teaspoons are 5 ml and tablespoons 15 ml.

Well there's a rabbit hole I wasn't expecting to go down. I knew that Australian tablespoons (20 mL) were significantly different from US tablespoons. I didn't know that UK tablespoons were a whole different beast (14.2 mL), nor did I realize US tablespoons aren't quite 15 mL, and in fact my tablespoon measures are marked 15 mL. 15 mL is handily 1/16 of a US cup so it's easy enough to translate to 1/4 cup (4 tsbsp) and 1/3 cup (5 tbsp).

https://en.wikipedia.org/wiki/Tablespoon

kruador · 2025-09-15T11:02:15 1757934135

Blame the European regulators who decided that it was no longer necessary to have standard pack sizes.

Pack sizes were regulated in 1975 for volume measures (wine, beer, spirits, vinegar, oils, milk, water, and fruit juice) and in 1980 for weights (butter, cheese, salt, sugar, cereals [flour, pasta, rice, prepared cereals], dried fruits and vegetables, coffee, and a number of other things). In 2007, all of that was repealed - and member states were now forbidden from regulating pack sizes!

I think the rationale was that now the unit price (price per unit of measurement) was mandatory to display, consumers would still know which of two different packs on the same shelf was better value. But standard pack sizes don't just provide value-for-money comparisons, as this article shows.

antonyh · 2025-09-16T15:25:39 1758036339

Ironically it seems (from memory, I've not researched it deeply) that continental butter has not changed from 250g, whereas the British brands have moved first to 200g. I could understand if they switched to 225g as essentially a half-pound block, but 200g isn't any closer to an useful Imperial measure than 250g.

kruador · 2025-07-14T11:04:08 1752491048

It wasn't possible on the 386. Ken Shirriff discusses how the Intel 80386's register file was built at https://www.righto.com/2025/05/intel-386-register-circuitry..... Only four of the registers are built to allow 32-, 16- or 8-bit writes. Reads output the entire register onto the bus and the ALU does the appropriate masking. The twist is for the legacy 16-bit upper half-registers - themselves really a legacy of the 8080, and the requirement to be able to directly translate 8080 code opcode-for-opcode. The output of these has to be shifted down 8 bits to be in the right place for the ALU, then these bits have to be selected.

AMD seem to have decided to regularise the instruction set for 64-bit long mode, making all the registers consistently able to operate as 64-bit, 32-bit, 16-bit, and 8-bit, using the lowest bits of each register. This only occurs if using a REX prefix, usually to select one of the 8 additional architectural registers added for 64-bit mode. To achieve this, the bits that are used to select the 'high' part of the legacy 8086 registers in 32- or 16-bit code (and when not using the REX prefix) are used instead to select the lowest 8 bits of the index and pointer registers.

From the "Intel 64 and IA-32 Architectures Software Developer's Manual":

"In 64-bit mode, there are limitations on accessing byte registers. An instruction cannot reference legacy high-bytes (for example: AH, BH, CH, DH) and one of the new byte registers at the same time (for example: the low byte of the RAX register). However, instructions may reference legacy low-bytes (for example: AL, BL, CL, or DL) and new byte registers at the same time (for example: the low byte of the R8 register, or RBP). The architecture enforces this limitation by changing high-byte references (AH, BH, CH, DH) to low byte references (BPL, SPL, DIL, SIL: the low 8 bits for RBP, RSP, RDI, and RSI) for instructions using a REX prefix."

In 64-bit code there is very little reason at all to be using bits 15:8 of a longer register.

This possibly puts another spin on Intel's desire to remove legacy 16- and 32-bit support (termed 'X86S'). It would remove the need to support AH, BH, CH and DH - and therefore some of the complex wiring from the register file to support the shifting. If that's what it currently does.

Actually, looking at Agner Fog's optimisation tables (https://www.agner.org/optimize/instruction_tables.pdf) it appears there is significant extra latency in using AH/BH/CH/DH, which suggests to me that the processor actually implements shifting into and out of the high byte using extra micro-ops.

aleph_minus_one · 2025-07-14T15:01:20 1752505280

> In 64-bit code there is very little reason at all to be using bits 15:8 of a longer register.

I disagree: there only exists BSWAP r32 (and by 64 extension BSWAP r64): https://www.felixcloutier.com/x86/bswap

No BSWAP r16 exists. Why? in 32 bit mode, it was not needed, because you could simply use

XCHG r/m8, r8

with, say, cl and ch (to swap the endianness of cx).

In 64 bit mode, you can thus only the endianness of a 16 bit value for the "old" registers ax, cx, dx, bx using one instruction. If you want to swap the 16 bit part of one of the "new" registers, you add least have to do a 32 bit (logical) right shift (SHL) after a BSWAP r32 (EDIT: jstarks pointed out that you could also use ROL r/m16, 8 to do this in one instruction on x86-64). By the way: this solution has a pitfall over BSWAP: BSWAP preserves the flags register, while SHL does not.

jstarks · 2025-07-14T15:17:02 1752506222

What about ROL r/m16, 8?

aleph_minus_one · 2025-07-14T15:31:09 1752507069

This would indeed work (and is likely the better solution), but in opposite to BSWAP and XCHG, it also changes flags.

hollowonepl · 2025-07-23T13:55:32 1753278932

In the meantime I also read somewhere that this feature is only available in the long mode

kruador · 2025-07-01T12:21:14 1751372474

No, SDRAM means Synchronous DRAM, where the data is clocked out of the DRAM chips instead of just appearing on the bus some time after the Column Address Strobe is asserted. Clocking it means that the data doesn't appear before the CPU (or other bus master) is ready to receive it, and that it doesn't disappear before the CPU has read it.

Static RAM (SRAM) is a circuit that retains its data as long as the power is supplied to it. Dynamic RAM (DRAM) must be refreshed frequently. It's basically a large array of tiny capacitors which leak their stored charge through imperfect transistor switches, so a charged capacitor must be regularly recharged. You would think that you would need to read the bit and rewrite its value in a second cycle, but it turns out that reading the value is itself a destructive operation and requires the chip to internally recharge the capacitors.

Further, the chip is organised in rows and columns - generally there are the same number of Sense Amplifiers as columns, with a whole row of cells discharging into their corresponding Sense Amplifiers on each read cycle, the Sense Amplifiers then being used to recharge that row of cells. The column signals select which Sense Amplifier is connected to the output. So you don't need to read every row and column of a chip, just some column on every row. The Sense Amplifier is a circuit that takes the very tiny charge from the cell transistor and brings it up to a stable signal voltage for the output.

So why use DRAM at all if it has this need to be constantly refreshed? Because the Static RAM circuit requires 4-6 transistors per cell, while DRAM only requires 1. You get close to 4-6 times as much storage from the same number of transistors.