The combination of transactions, isolation levels, and MVCC is such a huge undertaking to cover all at once, specially when comparing how it's done across multiple DBs which I attempted here. Always a balance between technical depth, accessibility to people with less experience, and not letting it turn into an hour-long read.
I actually like this article a lot. I do a bit of teaching, and I imagined the ideal audience for this as a smart junior engineer who knows SQL and has encountered transactions but maybe doesn’t really understand them yet. I think introducing things via examples of isolation anomalies (which most engineers will have seen examples of in bugs, even if they didn’t fully understand them) gives the explanation a lot more concreteness than starting with serializability as a theoretical concept as GP is proposing. Sure, strict serializability is a powerful idea that ties all this together and is more satisfying for an expert who already knows this stuff. But for someone who is just learning, you have to motivate it first.
If anything, I’d say it might be better to start with the lower isolation levels first, highlight the concurrency problems that can arise with them, and gradually introduce higher isolation levels until you get to serializability. That feels a bit more intuitive rather than downward progression from serializability to read uncommitted as presented here.
It also might be nice to see a quick discussion of why people choose particular isolation levels in practice, e.g. why you might make a tradeoff under high concurrency and give up serializability to avoid waits and deadlocks.
But excellent article overall, and great visualizations.
Similar to how in the "POSIX threads" section you indicate that pthread_create() is UNIX/POSIX-specific, you should probably indicate in the "Creating new Processes" section that fork() and execve() are Unix/POSIX-specific. That, or you could indicate that you are describing generally in that context.
In "Running multiple programs" you state: "After a few seconds, it pauses, saves the state of the RAM, and changes processes." Saying it "saves the state of the RAM" is probably not a good characterization since it is more likely just changing what memory is accessible. It is also, most likely changing the virtual memory mappings to give the illusion of using the same memory addresses (though it does not have to, for instance a system with a MPU still has a concept of distinct programs even though virtual memory mappings are not supported), but you could omit that to the later virtual memory section without being incorrect.
Yeah, I purposefully "simplified" how virtual memory works here so that the focus could be on multitasking and context switching, but agree that isn't totally accurate. I actually had hoped to do a part II on virtual memory, but we'll see if time permits that :)
Still in virtual machines, but ones with local NVMe drives rather than network-attached storage (EBS, Persistent Disk). This means incredible I/O performance.
reply