I know nothing about VMs or filesystems but I absolutely enjoyed this article. The language was very clear and easy to follow. Would be following the blog from now on.
I have a question about the copy-on-write example involving VM A and VM B. It says t VM B will directly use all the data from VM A and for any change, it copies the block, writes into it and reads from it after this.
But what if, say, block 2 is changed by VM A and was never written to by VM B? Wouldn't VM B read the changed block 2? Clearly, it doesn't happen cause a fork is a copy, but an explanation of how this is tackled is appreciated!
Yes, and this is also the biggest challenge. Right now we use XFS to enable CoW, but that quickly leads to filesystem fragmentation. I'm still looking at a way that we can quickly let both VM A and VM B use the same base snapshot, and write new changes to either anonymous memory or a file.
So let's say VM A is already running, and it's cloned to VM B. When that happens, we freeze the memory of VM A, and link it to VM B. For both VM A and VM B, any new write will be done to a new layer.
So the logic is to check if VM A has a new fork. If yes, then start CoW to a new layer of blocks, and leave the current layer to be linked with VM B. If no, just don't use CoW.
I have a question about the copy-on-write example involving VM A and VM B. It says t VM B will directly use all the data from VM A and for any change, it copies the block, writes into it and reads from it after this.
But what if, say, block 2 is changed by VM A and was never written to by VM B? Wouldn't VM B read the changed block 2? Clearly, it doesn't happen cause a fork is a copy, but an explanation of how this is tackled is appreciated!