As Jonathan mentioned, we made this decision around 9 months back and at that time Kinesis wasn't as mature and had less flexibility around retention period etc.
Kafka is very reliable (as I had seen it handling billions of events a day at LinkedIn) and has a huge open-source community around it. At IFTTT, we always prefer to use and contribute to open source ( http://engineering.ifttt.com/oss/2015/07/23/open-source/ ).
I'm assuming that you run Kafka within AWS. Much of the hardware requirements/suggestions I've seen for Kafka are all for non-virtualized environments. If you can get into it, could you share some details...
- What is the size of your Kafka cluster
- What instances types do you use?
- Do you use EBS or use ephemeral storage?
- How much do you over-provision to deal with instance loss?
We don't do it for the production database because we don't need it in realtime.