- No out of the box horizontal sharding, according to the post they had 4TB (compressed) data in the cluster in 2017. Looking at their growth I think it is safe to assume that today they would have >50TB which can't be done on a single node. You could use Citus but this is not exactly vanilla Postgres anymore. For such a simple data model wasting time implementing your own sharding solution and (more importantly) shard migration makes no sense.
- Discord is storing text data, in Postgres this will be stored in TOAST tables which has some drawbacks.
- Their workload is mostly inserts, almost no updates. Vacuum only operates on complete tables so you would wast I/O and CPU processing data which you don't even touch. You can partition tables but it's a manual process and you have to make compromises. In 2017, Postgres partitioning still had many performance drawbacks.
- No out of the box redundancy.
- Once your data doesn't fit in memory, Postgres performance becomes unpredictable.
Personally I would have chosen ElasticSearch for this project.
My understanding was PG only uses TOAST when the data is too large to fit in the row, and since PG compresses data before inserting wouldn't user messages be fine?
Do you have any case studies for ElasticSearch that you can recommend, in projects similar to this? Would be very interested in seeing what that option would look like.
- No out of the box horizontal sharding, according to the post they had 4TB (compressed) data in the cluster in 2017. Looking at their growth I think it is safe to assume that today they would have >50TB which can't be done on a single node. You could use Citus but this is not exactly vanilla Postgres anymore. For such a simple data model wasting time implementing your own sharding solution and (more importantly) shard migration makes no sense.
- Discord is storing text data, in Postgres this will be stored in TOAST tables which has some drawbacks.
- Their workload is mostly inserts, almost no updates. Vacuum only operates on complete tables so you would wast I/O and CPU processing data which you don't even touch. You can partition tables but it's a manual process and you have to make compromises. In 2017, Postgres partitioning still had many performance drawbacks.
- No out of the box redundancy.
- Once your data doesn't fit in memory, Postgres performance becomes unpredictable.
Personally I would have chosen ElasticSearch for this project.