Hacker Newsnew | past | comments | ask | show | jobs | submit | dm03514's commentslogin

Ha Yes! A pipeline assumes a "batch" of data, which is backed by an ephemeral duckdb in memory table. The goal is to provide SQL table semantics and implement pipelines in a way where the batch size can be toggled without a change to the pipeline logic.

The stream is achieved by the continuous flow of data from Kafka.

SQLFlow exposes a variable for batch size. Setting the batch size to 1 will make it so SQLFlow reads a kafka message, applies the processor SQL logic and then ensures it successfully commits the SQL results to the sink, one after another.

SQLFlow provides at least once delivery guarantees. It will only commit the source message once it successfully writes to the pipeline output (sink).

https://sql-flow.com/docs/operations/handling-errors

The batch table is just a convention which allows for seamless batch size configuration. If your throughput is low, or if you require message by message processing, SQLFlow can be toggled to a batch of 1. If you need higher throughput and can tolerate the latency, then the batch can be toggled higher.


Oh yes!! I've seen this a couple times. I am far from an expert in tributary so please take with a grain of salt.

Based on the tributary documentation, I understand that tributary embeds kafka consumers into duckdb. This makes duckdb the main process that you run to perform consumption. I think that this makes creating stream processing POCs very accessible. It looks like it is quite easy to start streaming data into duckdb. What I don't see is a full story around Devops, operations, testing, configuration as code etc.

SQLFlow is a service that embeds DuckDB as the storage and processing brains. Because of this, we're able to offer metrics, testing utilities, pipelines as code, and all the other DevOps utilities that are necessary to run a huge number of streaming instances 24x7. SQLFlow was created as a tool that I wish I had to for simple stream processing in production in high availability contexts :)


Nice! Thanks for the context, it's great to know!

Stream to stream joins are NOT currently supported. This is a regularly requested feature, and I'll look at prioritizing it.

SQLFlow uses duckdb internally for windowing and stream state storage :), and I'll look at extending it to support stream / stream joins.

Could you describe a bit more about your use case? I'd really appreciate it if you could create an issue in the repo describing your use case and desired functionality a bit!

https://github.com/turbolytics/sql-flow/issues

We were looking at solving some of the simplier use cases first before branching out into these more complicated ones :)


I worked on stream processing at my previous gig but don't have a need for it currently. Just curious.

Great Article! This is also timely for me, I spent all last week deep in the postgres documentation learning about replication (wish I had this article).

I'm building kafka connect (i.e. Debezium) compatible replication:

https://github.com/turbolytics/librarian/pull/41

-----

I really like the mongo change stream API. I think it elegantly hides a lot of the complexity of replication. As a side-effect of building postres -> kafka replication, I'm also trying to build a nicer replication primitive for postgres, that hides some of the underlying replication protocol complexity!


Still hacking on some data tools:

DuckDB for stream processing:

https://github.com/turbolytics/sql-flow

Lightweight kafka stream processing using DuckDB as the execution engine. 300MiB runtime can easily handle thousands of messages / second.

Working on a Kafka Connect alternative:

https://github.com/turbolytics/librarian

Right now mongo replication (through changestreams) is supported to kafka. Working on Postgres support right now.


How are you defining “B2C businesses on Main Street.”

The reason I’m wondering is because it’s striking how much more financially challenged my Main Street is compared to Vc texh.

I see so much opportunity for a small medium business consultation in the analytics and process space but these companies are like really strapped for money and largely set in their ways in my experience.

In my experience, people are open to solving their problems. It’s just the money is hard making it financially viable so it’s just the big money is just like an order of magnitude smaller.

Another thing I’ve noticed is that I think the general level of sort of like process thinking and data driven decision making in tech is at just like a higher baseline than on Main Street

A lot of my discussions are challenging how to sort of like present the problem in a way that somebody that doesn’t have decades of experience and operations understands.

Another challenge that I face regularly with Main Street companies is just people seem to be happy like they’re not trying to continuously optimize like I’m used to doing coming for a big tech. Even when it’s easy to present like positive ROI opportunities there’s just like a comfort with the way things are done and a lot of people seem just happy governed by their scaling factors in exchange for that that comfort.

TLDR; challenges are financial and mentality.


I define B2C businesses on main street as everything from a regional chain of plumbers to a medical practice with 10 or 15 doctors in one area, to a realtor.

It sounds like you are very much motivated by optimizing processes and profitability, and/or the income that doing so provides you. What I've realized is that in many ways I'm not sufficiently motivated by that. What motivates me is the gratification I get from seeing every day people impacted by my work.

Nothing wrong with that, just different things for different people. In fact, you may be a more "enlightened" being because you can focus on the longer term rewards.


I think you’re right. For many businesses the tech is a background enabler rather than the focus. But if you find the right business and culture and leverage the tech, that’s the sweet spot. Tough to find though…


100% right. Any tips on finding it?

One thing I'm hearing over and over is "focus on impact, not process." Which is something I think is something you're seeing become a growing need in tech as well.


Tough to say without knowing you, but I’d say figure out what you’re good at, what motivates you and look for people you enjoy working with. Maybe that takes you to a certain industry but maybe not.


> Even when it’s easy to present like positive ROI opportunities there’s just like a comfort with the way things are done

I could see that getting pretty frustrating


Productionizing duckdb :) I built a streaming tool around duckdb that allows for high performance stream processing and a rich connector ecosystem:

https://github.com/turbolytics/sql-flow

Building a company around a tool is hard. There's been some interest but streaming is kind of commoditized.

I'm taking everything I learned building it and working on a customer-facing security product, more to come on that :)


Built a stream processing engine using duckdb

https://github.com/turbolytics/sql-flow

It has some interest, unfortunately building tools as a business strategy is rough.

Beginning to work on first actual product! More soon :)


I just started using hurl a couple months ago.

For my uses it's great that it has both test suite mode and individual invocation mode. I use it to execute a test suite of HTTP requests against a service in CI.

I'm not a super big fan of the configuration language, the blocks are not intuitive and I found some lacking in the documentation assertions that are supported.

Overall the tool has been great, and has been extremely valuable.

I started using interface testing when working on POCs. I found this helps with LLM-assisted development. Tests are written to directly exercise the HTTP methods, it allows for fluidity and evolution of the implementations as the project is evolving.

I also found the separation of testing very helpful, and it further enforces the separation between interface and implementation. Before hurl, the tests I wrote would be written in the test framework of the language the service is written in. The hurl-based tests really help to enforce the "client" perspective. There is no backdoor data access or anything, just strict separation betwen interface, tests and implementation :)


Maintainer here, thanks for the feedbacks. 6-7 years ago, when we started working on Hurl, we started with a JSON then a YAML file format. We gradually convinced ourself to write a new file format and I completely understand that it might feel weird. We tried (maybe not succeeded!) to have something simple for the simple case...

I'm really interested by issues with the documentation: it can always be improved and any issues is welcome!


Looked forward to move out day at state university in early 2000s. The university would rent dumpsters and place in the common outdoor areas. The dumpsters had the end door that would open so it was easy to walk inside of them without climbing.

I’d Spend all morning in the dumpster with some friends. Name brand clothes were good finds, also pretty much all the textbooks carried a trade in value. Lots of sealed food snacks as well.

I don’t know if the kids that threw them away were lazy or they just didn’t know about buy back, but the books easily brought me $100 for a couple hours of morning dumpster diving.


The charitable take is that kids moving out from school have a ton of things they need to think about and deal with so taking optimal care of probably too many possessions may not be at the top of the list.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: