Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Seems like overkill no? Otel collectors are fairly cheap, why add expensive Kafka into the mix. If you need to buffer why not just dump to s3 or similar data store as a temporary storage array.


> If you need to buffer why not just dump to s3 or similar data store as a temporary storage array.

At that point it's very easy to sleepwalk into implementing your own database on top of s3, which is very hard to get good semantics out of - e.g. it offers essentially no ordering guarantees, and forget atomicity. For telemetry you might well be ok with fuzzy data, but if you want exact traces every time then Kafka could make sense.


Yeah, and to use S3 efficiently you also need to batch your messages into large blobs of at least 10s of MB, which further complicates the matter, especially if you don't want to lose those messages buffers.


if your otel collector is being overwhelmed. In such cases you have a lot of backlogged data not able to be ingested. So you dead letter queue it to s3 for freeing up buffers.

The approach here is to only send data to s3 as a last ditch resort.


If you're ok with losing some data when your collectors are overwhelmed, surely you'd just drop overflowing data in that case? Why go to all the effort of building a fallback ingestion path if it's not going to be reliable?


it's very hard to think s3 work as a buffer. Every datastore can work for almost all storage usecases buffer/queue/db when the scale is low but the latter were designed to work at scale


I really like this idea. And there is an OTEL exporter to AWS S3, still in alpha but I'm gonna test it soon: https://github.com/open-telemetry/opentelemetry-collector-co...


(WarpStream founder)

This is more or less exactly what WarpStream is: https://www.warpstream.com/blog/minimizing-s3-api-costs-with...

Kafka API, S3 costs and ease of use


Love the website, looks good, clear and to the point.


Why not both, dump to S3 and write pointers to kafka for portable event-based ingestion (since everybody does messages a bit differently)


No need as s3 objects is your dead letter queue and the system should be designed anyway to coupe with multiple write of same event.

The point is to only use s3 etc in the event of system instability. Not as a primary data transfer means.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: