Seems like overkill no? Otel collectors are fairly cheap, why add expensive Kafk...

lmm · on Oct 22, 2023

> If you need to buffer why not just dump to s3 or similar data store as a temporary storage array.

At that point it's very easy to sleepwalk into implementing your own database on top of s3, which is very hard to get good semantics out of - e.g. it offers essentially no ordering guarantees, and forget atomicity. For telemetry you might well be ok with fuzzy data, but if you want exact traces every time then Kafka could make sense.

dikei · on Oct 23, 2023

Yeah, and to use S3 efficiently you also need to batch your messages into large blobs of at least 10s of MB, which further complicates the matter, especially if you don't want to lose those messages buffers.

bushbaba · on Oct 23, 2023

if your otel collector is being overwhelmed. In such cases you have a lot of backlogged data not able to be ingested. So you dead letter queue it to s3 for freeing up buffers.

The approach here is to only send data to s3 as a last ditch resort.

lmm · on Oct 24, 2023

If you're ok with losing some data when your collectors are overwhelmed, surely you'd just drop overflowing data in that case? Why go to all the effort of building a fallback ingestion path if it's not going to be reliable?

ankitnayan · on Oct 23, 2023

it's very hard to think s3 work as a buffer. Every datastore can work for almost all storage usecases buffer/queue/db when the scale is low but the latter were designed to work at scale

francoismassot · on Oct 22, 2023

I really like this idea. And there is an OTEL exporter to AWS S3, still in alpha but I'm gonna test it soon: https://github.com/open-telemetry/opentelemetry-collector-co...

richieartoul · on Oct 23, 2023

(WarpStream founder)

This is more or less exactly what WarpStream is: https://www.warpstream.com/blog/minimizing-s3-api-costs-with...

Kafka API, S3 costs and ease of use

lnenad · on Oct 23, 2023

Love the website, looks good, clear and to the point.

prpl · on Oct 22, 2023

Why not both, dump to S3 and write pointers to kafka for portable event-based ingestion (since everybody does messages a bit differently)

bushbaba · on Oct 22, 2023

No need as s3 objects is your dead letter queue and the system should be designed anyway to coupe with multiple write of same event.

The point is to only use s3 etc in the event of system instability. Not as a primary data transfer means.