> Production HBase clusters typically used a primary-standby setup with six data...

riku_iki · on May 15, 2024

> That replication does 3 copies in the same data center.

You can set replication factor 1 if you want. You just will have high chance to lose your data forever.

eclark · on May 15, 2024

The HBase write-ahead log requires a full pipeline. It speculatively drops hadoop nodes out of the write pipeline. Because of the complexity of the WAL, I don't know anyone who has tried with less. (There are a good number of ways to run HBase with no write-ahead log)

There are people who run HBase with Reed Solomon encoding on the HFiles (the store files). That can get a replication factor below 3. I don't think that Pinterest ever ran an updated enough Hadoop for that to be available.

> You can set replication factor 1 if you want.

You can't, really. HBase will fail to open regions that have files that aren't able to be read. So setting the replication factor to 1, have a single hard drive go out and the table will forever have regions that can't be opened. The name node will have a file that's got lost blocks. HMaster will pass around the region assignment until it sticks failed to open.