Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not my experience at all.

We’ve been running a 3 node cluster for several years, and a significant minority of the times I’ve been paged is because ZK got into a bad state that was fixed by a restart (what bad state exactly? Don’t know, don’t care, don’t have two spare weeks to spend figuring it out). Note that we have proper liveness checks on individual instances, so the issue is more complicated than that.

Migrated to 3.3 with KRaft about half a year ago, and we haven’t had a single issue since. It just runs and we resize the disks from time to time.



> Migrated to 3.3 with KRaft about half a year ago

Did you follow their migration guide, or did you just rebuild the cluster and then using KRaft? I didn't know how "migration" was used in that context


We built a new cluster and very carefully rolled over our producers and consumers. It wasn’t simple, but it’s possible to do without downtime.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: