> HAOS is absolutely terrible when it faces filesystem corruption. I faced this a few times when my HAOS image crashed. You can't even get it to fsck on boot when it happens.
I don't see this as being unique to HAOS, if a filesystem is corrupted in a certain way then _no OS_ is going to be able to boot.
There are distros that are specifically optimized for the Pi (with, eg specific logging choices [1]) that try to avoid problems (with trade-offs).
I've used something before (can't remember/find it now, but maybe thought it was in Yocto?) where there were 3 partitions: one was read-write used for persisting user data, and two were for the OS/apps. One would be live and mounted as read-only, the other was for the next system update using a blue-green deploy strategy. I think it also used RAM for log and temp files.
> I don't see this as being unique to HAOS, if a filesystem is corrupted in a certain way then _no OS_ is going to be able to boot.
Theoretically true if you are very unlucky, however, that's not true at all for the vast majority of filesystem failures.
I am talking about failures where if I inspect the disk image outside the VM and fsck it, everything is fine.
Also, given that HA leans so heavily on docker, it would be a reasonable feature for it to rebuild some docker images should the disk damage them.
Traditional hard disk failures are common, but also, SD card failure is really common. If you've worked on a sufficiently popular mobile app you've probably seen tons of it. It's reasonable to plan for it at the application layer. To say nothing of at an OS layer. There's a very good reason why Unix traditionally ships with fsck tools and sometimes runs them at boot. Whoever designed HAOS not to do this made a big mistake, and I would hit this a few times a year in my old setup.
I don't see this as being unique to HAOS, if a filesystem is corrupted in a certain way then _no OS_ is going to be able to boot.
There are distros that are specifically optimized for the Pi (with, eg specific logging choices [1]) that try to avoid problems (with trade-offs).
I've used something before (can't remember/find it now, but maybe thought it was in Yocto?) where there were 3 partitions: one was read-write used for persisting user data, and two were for the OS/apps. One would be live and mounted as read-only, the other was for the next system update using a blue-green deploy strategy. I think it also used RAM for log and temp files.
[1] https://dietpi.com/docs/software/log_system/