I mostly shared the launch post and ran promo campaigns on Reddit, ProductHunt, LinkedIn, Discord and even tried HN (got no replies here -‿-"). Since then, its mostly word of mouth.
Thanks to my customers' feedback, I've made a lot of improvements to the app as well. Feels good getting positive feedback and hearing from people about their use-cases. :)
I work on an image search engine[0], main idea has been to preserve all the original meta-data and directory structure while allowing semantic and meta-data search from a single interface. All meta-data is stored in a single json file, with original Path and filenames, in case ever to create backups. Instead of uploading photos to a server, you could host it on a cheap VPS with enough space, and instead Index there. (by default it a local app). It is an engine though and don't provide any Auth or specific features like sharing albums!
Currently (Semantic) ML model is the weakest (minorly fine-tuned) ViT B/32 variant, and more like acting as a placeholder i.e very easy to swap with a desired model. (DINO models have been pretty great, being trained on much cleaner and larger Dataset, CLIP was one of first of Image-text type models !).
For point about "girl drinking water", "girl" is the person/tagged name , "drinking water" is just re-ranking all of "girl"s photos ! (Rather than finding all photos of a (generic) girl drinking water) .
I have been more focussed on making indexing pipeline more peformant by reducing copies, speeding up bottleneck portions by writing in Nim. Fusion of semantic features with meta-data is more interesting and challenging part, in comparison to choosing an embedding model !
I have been working on this project for quite some time now. Even though for such search engines, basic ideas remain the same i.e extracting meta-data or semantic info, and providing an interface to query it. Lots of effort have gone into making those modules performant while keeping dependencies minimal. Current version is down to only 3 dependencies i.e numpy, markupsafe, ftfy and a python installation with no hard dependence on any version. A lot of code is written from scratch including a meta-indexing engine and minimal vector database. Being able to index any personal data from multiple devices or service without duplicating has been the main them of the project so far!
We (My friend) have already tested it on around 180gb of Pexels dataset and upto 500k of flickr 10M dataset. Machine learning models are powered by a framework completely written in Nim (which is currently not open-source) and has ONEDNN as only dependency (which has to be do away to make it run on ARM machines!)
I have been mainly looking for feedback to improve upon some rough edges, but it has been worthwhile to work upon this project and includes code written in assembly to html !
Using `zig-cc (clang)` to set a particular `LibC` version is one of the best decisions i have made, and saved me from those meaningless libC mismatch errors!
I know it may be not what you are looking for, but most of such models generate multiple-scale image features through an image encoder, and those can be very easily fine-tuned for a particular task, like some polygon prediction for your use case. I understand the main benefit of such promptable models to reduce/remove this kind of work in the first place, but could be worth and much more accurate if you have a specific high-load task !
Pretty cool project!, I have been also trying to do something similar with very limited (abstract) OPs akin to fundamental computer instructions. Just using the numpy backend for now to test theory, but neat thing is that most of complexity lies in the abstract space like deciding which memory accesses could be coalesced even before generating the final code for a specific backend! As far as i know most of DL compilers struggle to generate optimum code, as model starts getting bigger and bigger . Halide project was/is a very cool project that speed up many kernels just by finding better cache/memory access pattern. If you happen to share more insights about your projects through blog-posts or whitepaper that would be really helpful.
A bit tangential statement, about parakeet and other Nvidia Nemo models, i never found actual architecture implementations as pytorch/tf code, seems like all such models, are instant-ized from a binary blob making it difficult to experiment! Maybe i missed something, does anyone here have more experience with .nemo models to shed some more light onto this?
Nice write up!
It is about speeding up underlying hash-table for a `cuckoo filter` , as in false-positives are allowed, unlike data-structure like dictionary, which makes sure that false-positive rate is 0.
Whenever i need to use dictionary/hash-table in a hot loop where maximum possible iterations are known, i use a custom hash-table initialized with that value, and in some cases, standard library implementation may store `hash` along with `key` and `value` in case it needs to `resize`, in my case i don't store the `hash` which leads to speed up about 15-20% due to less cache misses, as there would be no resize possible!
Also `cuckoo filters` seems to use `robinhood hashing` like strategy, using in a hot-loop it would lead to a higher insert cost ?
Thank you! You are right. Cuckoo Filters use the open-addressing technique to deal with collisions, which may result in a long eviction chain on inserts or even fail if the load factor is high or just bad luck. This is one major drawback you should consider before using it.
https://maltsev.space/blog/010-cuckoo-filters#fingerprints-a...
Are you trying to do this message passing using Nim channels ?
For my use-cases, i always had to resort to just passing `pointers`, to prevent any copying, most of time i just divide the problem to use independent memory locations to write to avoid using locks, but that is not general pattern for sure. For read only purposes i just use `cursor` while casting to appropriate type. If you find a useful pattern, please share.
No. Nim channels are for inter-thread communication within the same process, and they perform a deep-copy of messages. Fusion channels are a different beast. They're kernel objects that require syscalls to create/open/send/recv/close. In order to achieve zero-copy, the channel heap needs to be in user-space, but managed by the kernel. It's kind of similar to shared memory on POSIX, but has message passing semantics, rather than arbitrary read/write from/to shared memory. And since Fusion is a single address space operating system, I have the luxury of passing pointers (to data on the channel heap) directly between tasks without resorting to (de)serialization. Protection is achieved through page table mappings, where e.g. the sender would have read/write access to the channel heap, but the receiver has read-only access to the same memory.
As for your case, I understand the need to avoid locks. In my case, the channel is implemented as a queue of messages, which is implicitly synchronized between tasks. Currently it's implemented using a blocking queue, since I want to put senders/receivers to sleep when the queue is full/empty, respectively. The API does support a no-wait option, but I'm not currently using it.
There will be a lot to write about once I'm past this point :)
Thanks for explaining it, given you are writing it from scratch it gives you a lot of control in modelling a particular feature!
I did bookmark this project a few months ago but couldn't spend time to understand more about it. I wasn't aware of documentation, which should now make it easy to start with. Thanks for putting a lot of work in documentation!
reply