More

jinmingjian · on April 17, 2024

Location: China(UTC+8)

Remote: YES

Willing to relocate: YES

Technologies: Database, SQL, Rust, Java, JVM, Scala, WebAssembly, WASM, React, C/C++, Python, Javascript, Typescript, HTML5, CSS, TailwindCSS, Node, REST, Tauri, Linux, Kernel, IO_URING, BPF, LLVM, MLIR, LLM, LLAMA, ChatGPT, Finetuning, Vector Database, RAG, chatpdf, chatbi, chatexcel, BI, Analytics, Bigdata, Datalake, Data Warehouse, ClickHouse, PostgreSQL, Spark, Flink, Impala, Iceberg, Hudi, Kudu, Parquet, Arrow, SQL on hadoop, Redis, Distributed, AWS, GCP, Azure, SIMD, Lockfree, GPU, WebGPU, CUDA, ML, PyTorch, Triton, Compilation, JIT, VM, Docker, Cloud, MQTT, HTTP, TCP, IoT, Time series, Gateway, Ethereum, EVM, ZK

Résumé/CV: Send upon request

Email: jin.phd at gmail.com

14 years in work experience and 21 years in commercial software development, focused on engineering and science fields such as databases, compilers, deep learning, language design, high-performance computing, high-concurrency backend and network protocol and communication from data centers, to clouds and to decentralized internet.

jinmingjian · on March 15, 2023

MQTT and Kafka are different things. MQTT is not necessarily better or worse than Kafka, vice versa.

But, sadly, people are often caught in their own loops, when there are or can be created better options.

I have created a new, free, single binary data-service platform for IoT, JoinBase: https://joinbase.io/

This single binary data-service platform has proved:

1. Kafka is not more suitable for industry or higher performance than MQTT, even from the protocol level

With carefully crafting, JoinBase has saturated one PCIE 3.0 NVME sustained write bandwidth (25 million msg/s) in single modern node. This is, in fact, can not been done by the Java-based Kafka.

We have provided FREE full functionality community for testing: https://joinbase.io/products/

2. Use MQTT and Kafka together is unnecessary. This only makes your pipeline more complex, expensive, but much unstable and slower.

JoinBase can do arbitrary message preprocessing and auto-view(WIP). Streaming does not have to be owned through a separate monster.

3. High performance or ease of use, has nothing to do with the size of the software if the product can be properly engineered.

5MB Single binary JoinBase is enough to beat many monsters in the IoT/AIoT data pipleine: + sustained batch MQTT message write throughput: ~10x faster than Kafka and ~5x faster than that of one popular broker + basic SQL analytics: 3-4x faster than ClickHouse + HTTP interface concurrent queries: ~100x higher than ClickHouse + ... More could be seen in our 2022 summary blog: https://joinbase.io/blog/joinbase-2023/

There are historical reasons for all of these, of course. But it could be great that we break out of own mindset loops.

jinmingjian · on Dec 30, 2022

I am the author of JoinBase. It is interesting to see that one REST layer project on the top of PostgreSQL is up on the first page: https://news.ycombinator.com/item?id=34172205

JoinBase shares some basic ideas with PostgREST, but we provide a much simpler solution for the view of database: installation, configuration, authentication, schema creation and even performance are damn simple within JoinBase.

jinmingjian · on Dec 30, 2022

Shameless insertion:

We release our new HTTP interface with our free AIoT database - JoinBase:

HN discuss: https://news.ycombinator.com/item?id=34181591

or the blog link: https://joinbase.io/blog/http-interface/

Different to PostgREST, the HTTP interface of JoinBase is integrated in the database. So, no verbose setup, one binary rules them all!

Our HTTP interface is inspired from the ClickHouse, but we provide 100x message throughput than that of ClickHouse in the HTTP interface. JoinBase's HTTP interface is so fast that you can use it to provide unlimited production-level REST services without any worry.

If someone are interesting for JoinBase, just request the free distribution here: https://joinbase.io/request/

jinmingjian · on Nov 24, 2022

YMMV.

But the Apache License is truly dead for startups. Most of "open source" startups are living on the VC investments. If a good business cycle cannot be built, the achievements of open source cannot be truly maintained. This is the author's usefulness.

josephcsible · on Nov 24, 2022

> But the Apache License is truly dead for startups.

Maybe, but the right move is to copyleft, not to closed source.

jinmingjian · on Nov 24, 2022

Copyleft does not solve the problem: how to build a positive business cycle. No company seriously contributes to the copyleft. That is why the Apache License is raising. And you can see that all changed licenses mentioned in the article are not copyleft. Copyleft is great for licencors, but most of companies in the world, does not want to share their customs. This is the businesses.

The Linux kernel is almost the only one in the world and cannot be copied.

theamk · on Nov 24, 2022

For business, the copyleft is combined with proprietary license. So business gets "pro" version with no copyleft requirements, while open-source users get the same code under a different license.

And a lot of the companies mentioned in the article have moved to BSL, which _is_ copyleft (after a few years of delay), or to SSPL which is arguably even more extreme copyleft than AGPL.

josephcsible · on Nov 24, 2022

It's misleading to call the BSL and SSPL copyleft, because while they are viral, they're not free.

jinmingjian · on Nov 24, 2022

USAL distilled:

* The license is clear that you can anything to the licensed sources except the (re)distribution.

* With USAL, the advantages of Apache License are inherited, while the disadvantages of Apache License are avoided. (for "open source" infra startups)

* BSL breaks the greatest feat of open source, USAL fixes it.

theamk · on Nov 24, 2022

One of the big advantages of open source (and probably the most important one for many organizations/startups) is ensuring continuous access.

If you were paying for commercial license to developers before and the company behind the product decides to raise prices 100x overnight, or gets sold and closes down, you are screwed in closed-source/source-available case -- start paying more or migrate away from it. But in case of open source software, you'll likely to be able to find alternate source of support -- maybe a different community-maintained fork, or some independent consultants fixing the bugs just for you. This puts an upper limit on the software prices, which is great for consumers.

USAL specifically prohibits this - if you were dependent on USAL-licensed software, you might be able to get local engineers to support it, but you cannot collaborate with other users, nor you can ask any third parties to help.

Between this or BSL, I'd choose BSL any time. If something happens, at least BSL guarantees the software will become open-source eventually... USAL instead guarantees that software is lost forever.

jinmingjian · on Nov 24, 2022

"Third parties to help" is a good point. Although under the USAL, you can make "third parties" to become users.

Thanks for the suggestion! The USAL is new, it may be updated to allow pluggable additional grants to allow more flexibility, for example, if the licencor as a business entity does not exist, then the licensed works could be changed to another license.

jinmingjian · on Nov 9, 2022

I am the developer of JoinBase and happy to answer any question about it.

The interests of JoinBase is that, in combination with kinds of scenarios, we explore the infinite possibilities of expanding database technologies.

The database is is dead, long live the database.

theamk · on Nov 9, 2022

No documentation. No details except most basic ones (even something as simple as supported authentication or backup methods are not covered). No source code.

I don't see why anyone would ever use this.

jinmingjian · on Oct 24, 2022

sorry, JoinBase now is just freeware, but not open-sourced. If you are interesting, you can request here: https://joinbase.io/request/.

jinmingjian · on Oct 23, 2022

Interesting reading, thanks for sharing. This is quick summary:

The articles compare the performance of different MQTT brokers implemented in different languages, including D, C, Erlang, Go, Java and C++. The basic conclusion should be D impl is the winner in the loadtest case, but C (Mosquitto) are the total winner for two case: loadtest + pingtest. C++ impl is not good. And the author said he "really don’t want to write C++ again unless I have to".

Basically, the conclusion of the article is nice. For MQTT brokers, the primary case sub-to-pub is just a layer-7 forwarding. This is an IO intensive scenario, if your MQTT parser is not written too badly.

So, for the impls done in system level languages, like C or D should archive the similar performance. Here, C++ impl should be an exception, although I haven't looked into the exact reason.

These blog articles, in fact, are quite old. The 100k message/second got with hundreds of connections is top ranked. However, the current Intel 12th Gen notebook can run at 250k mps with just 1 connection. I have tested (and read) several open-source MQTT clients and brokers. These MQTT brokers including impls done in C, Java and Erlang, can run at 1 million to 2.x million mps in a single modern Xeon socket but hard to go upward.

The truth here is that, the free lunch given by the efficient modern kernel TCP stack is just at 1-2M mps. If you want a faster broker, you need to make this kind layer-7 forwarding faster in the computational part, i.e. faster parsing and parallelizing. Traditional optimization tricks comes back again, like cache-friendly coding. The result is that we reached around 8 million mps in the same modern box even with message parsing, data routing and data dump all done in the first version of JoinBase(https://joinbase.io/).

The free lunch from the Moore's Law is truly over. We should be prepared for this.

jinmingjian · on Oct 23, 2022

Thanks for all the 3 upvotes:) I am the developer of the OIDBS, and glad to answer any question here.

Kuinox · on Oct 23, 2022

I have a WIP, more or less working, MQTT Client & Server, is there a way for me to benchmark my implementation with your benchmark suite ?

jinmingjian · on Oct 23, 2022

Great. The OIDBS is using a built-in MQTT client. So, if you want o benchmark your server, just run the `import` command like as shown in the quick start video:

``` MODELS_ROOT=CURRENT_OIDBS_DIR import DIR_TO_DATA_SOURCE -n nyct_lite -d ```

here,

CURRENT_OIDBS_DIR is a hacky ENV var, for finding the models, should be removed in the future.

-n: is for the built-in model name, only the OIDBS repsect two models: nyct_lite and nyct_strip, you can get the two model sources from the article

-d: is for importing data only without injecting schemas. This is common if you are benching against a MQTT broker or server as you said. In fact, with this opt enabled, you can import any data source iff the data source is CSV or JSON. The benchmark logic is just to read and send the message line by line. That is, the files in the nyct_lite and nyct_strip directory is can be any CSV or JSON format. And the files'name is not important, but the model name still to be one of above two. Because the code checks the name.

(Ok, OIDBS may let the model name to be optional for such case, I will record this issue. thanks!)

If you have any problem, you can ask details in the community. I will help you as possible.