More

aphyr · 2025-12-09T00:16:42 1765239402

Jepsen started as a personal blog series in nights and weekends; jepsen.io is when I started doing it professionally, about ten years ago.

bsaul · 2025-12-09T08:03:32 1765267412

Curious : do you have a team of people working with you, or is it mostly solo work ? your work is so valuable, i would be scared for our industry if it had a bus factor of 1.

aphyr · 2025-12-08T21:20:02 1765228802

Yes, good call! You can batch up multiple operations into a single call to fsync. You can also tune the number of milliseconds or bytes you're willing to buffer before calling `fsync` to balance latency and throughput. This is how databases like Postgres work by default--see the `commit_delay` option here: https://www.postgresql.org/docs/8.1/runtime-config-wal.html

to11mtm · 2025-12-08T22:03:44 1765231424

> This is how databases like Postgres work by default--see the `commit_delay` option here: https://www.postgresql.org/docs/8.1/runtime-config-wal.html

I must note that the default for Postgres is that there is NO delay, which is a sane default.

> You can batch up multiple operations into a single call to fsync.

Ive done this in various messaging implementations for throughput, and it's actually fairly easy to do in most languages;

Basically, set up 1-N writers (depends on how you are storing data really) that takes a set of items containing the data to be written alongside a TaskCompletionSource (Promise in Java terms), when your stuff wants to write it shoots it to that local queue, the worker(s) on the queue will write out messages in batches based on whatever else (i.e. tuned for write size, number of records, etc for both throughput and guaranteeing forward progress,) and then when the write completes you either complete or fail the TCS/Promise.

If you've got the right 'glue' with your language/libraries it's not that hard; this example [0] from Akka.NET's SQL persistence layer shows how simple the actual write processor's logic can be... Yeah you have to think about queueing a little bit however I've found this basic pattern very adaptable (i.e. queueing op can just send a bunch of ready-to-go-bytes and you work off that for threshold instead, add framing if needed, etc.)

[0] https://github.com/akkadotnet/Akka.Persistence.Sql/blob/7bab...

aphyr · 2025-12-08T22:08:56 1765231736

Ah, pardon me, spoke too quickly! I remembered that it fsynced by default, and offered batching, and forgot that the batch size is 0 by default. My bad!

to11mtm · 2025-12-08T22:31:58 1765233118

Well the write is still tunable so you are still correct.

Just wanted to clarify that the default is still at least safe in case people perusing this for things to worry about, well, were thinking about worrying.

Love all of your work and writings, thank you for all you do!

loeg · 2025-12-09T00:32:16 1765240336

In some contexts (interrupts) we would call this "coalescing." (I don't work in databases, can't comment about terminology there.)

aphyr · 2025-08-07T22:37:33 1754606253

Author here--from discussions with Capela's team, I think this sort of early testing can be remarkably helpful, because it offers a test suite that Capela's team can check their work against as they move forward.

I would suggest against this kind of integration test when the data model or API are in constant flux, because then you have to re-write or even re-design the test as the API changes. Small changes--adding fields or features, changing HTTP paths or renaming fields--are generally easy to keep up with, but if there were, say, a redesign that removed core operations, or changed the fundamental semantics, it might require extensive changes to the test suite.

aphyr · 2025-08-07T18:04:10 1754589850

It does indeed! This is a part of https://github.com/jepsen-io/elle, which infers totally-connected components of the transaction dependency graph. :-)

aphyr · 2025-06-07T16:11:00 1749312660

If I might add on to what you and Joran are both saying, after some time working with TigerBeetle, I found it useful to think of Protocol-Aware Recovery as similar to TAPIR (https://syslab.cs.washington.edu/papers/tapir-tr14.pdf). Normally we build distributed systems on top of clean abstraction layers, like "Nodes are pure state machines that do not corrupt or forget state", or "the transaction protocol assumes each key is backed by a sequentially-consistent system like a Paxos state machine". TAPIR and PAR show a path for building a more efficient, or more capable, system, by breaking the boundaries between those layers and coupling them together.

aphyr · 2025-06-06T21:05:06 1749243906

And honestly--designing a generator for a system like this is hard. Really hard. I struggled for weeks to get something that didn't just fail 99% of requests trivially, and it's an (ahem) giant pile of probabilistic hacks. So I wouldn't be too hard on the various TB test generators here!

https://github.com/jepsen-io/tigerbeetle/blob/main/src/jepse...

aphyr · 2025-06-06T17:23:06 1749230586

Yeah, TigerBeetle's blog post goes into more detail here, but in short, the tests that were running in Antithesis (which were remarkably thorough) didn't happen to generate the precise combination of intersecting queries and out-of-order values that were necessary to find the index bug, whereas the Jepsen generator did hit that combination.

There are almost certainly blind spots in the Jepsen test generators too--that's part of why designing different generators is so helpful!

eevmanu · 2025-06-06T17:40:39 1749231639

Thanks for your answer aphyr and for this amazing analysis

aphyr · 2025-06-06T16:02:58 1749225778

To build on this--this is something of a novel technique in Jepsen testing! We've done arbitrary state machine verification before, but usually that requires playing forward lots of alternate timelines: one for each possible ordering of concurrent operations. That search (see the Knossos linearizability checker) is an exponential nightmare.

In TigerBeetle, we take advantage of some special properties to make the state machine checking part linear-time. We let TigerBeetle tell us exactly which transactions happen. We can do this because it's a.) strong serializable, b.) immutable (in that we can inspect DB state to determine whether an op took place), and c.) exposes a totally ordered timestamp for every operation. Then we check that that timestamp order is consistent with real-time order, using a linear-time cycle detection approach called Elle. Having established that TigerBeetle's claims about the timestamp order are valid, we can apply those operations to a simulated version of the state machine to check semantic correctness!

I'd like to generalize this to other systems, but it's surprisingly tricky to find all three of those properties in one database. Maybe an avenue for future research!

aphyr · 2025-05-04T17:38:05 1746380285

This is a whole story unto itself. I (and a bunch of other small-site operators) actually got the chance to ask Ofcom directly about how small a site would have to be to declare itself out-of-scope of part 3/5 services, and in short: the law doesn't specify, Ofcom declines to say, and it could be as small as "a one-man band". Ofcom has indicated that they're interested in pursuing aggressive enforcement against small service providers (read: web sites). I wound up concluding that the forum I help run could be defensibly out-of-scope, but aphyr.com gets significantly more traffic, and I'm trying to limit risks.

I think in total I spent something like 80 hours reading the legislation, working through thousands of pages of Ofcom guidance, and corresponding with Ofcom trying to figure out whether and how compliance was feasible.

https://blog.woof.group/announcements/updates-on-the-osa

aphyr · 2025-05-04T13:58:38 1746367118

I made a lot of my furniture--it's hard to replace.