More

cyrusthegreat · on July 18, 2022

Tiny wooden boxes.

cyrusthegreat · on June 30, 2022

This is awesome! Permanently bookmarked!

cyrusthegreat · on Sept 16, 2021

Hey! We're actually polishing up a PR that'll add documentation and finalize the versioning API, it should be merged in this weekend. Would you be up for a quick chat with someone on our team? It would be interesting to get your feedback and see what else we're missing to be a drop-in replacement to opendistro, join our slack if so. We'll dm you :) https://join.slack.com/t/featureform-community/shared_invite...

cyrusthegreat · on Sept 16, 2021

This is in the works! We'd love you feedback on the API and to learn a bit more about your use-case so we build the right thing, mind joining our slack? https://join.slack.com/t/featureform-community/shared_invite...

cyrusthegreat · on Sept 16, 2021

Yes! We plan to bring Faiss in and utilize a lot of its functionality, our goal for this release was to get an end-to-end working to get feedback on the API. HNSW was a good default with this in mind.

jamesblonde · on Sept 16, 2021

How does it compare to the OpenDistro for Elastic KNN plugin - which also uses HNSW (and also includes scalable storage, high availability, backups, and filtering)?

cyrusthegreat · on Sept 16, 2021

Our API is built from the ground up with the machine learning workflow in mind. For example, we have a training API that allows you to batch requests and even download your embeddings and generate an HNSW index locally. Our view of versioning, rollbacks, and more makes a lot of sense for an ML index, but very little sense for a search index.

cyrusthegreat · on Sept 16, 2021

Pinecone is closed source and only available as a SaaS service. Milvus and us have more overlap, we’re focused on the embeddings workflow like versioning and using embedding with other features. Milvus is entirely focused on nearest neighbor operations.

Faiss is solving the approximate nearest neighbor problem, not the storage problem. It’s not a database, it’s an index. We use a lightweight version of Faiss (HNSWLIB) to index embeddings in Embeddinghub.

cyrusthegreat · on Sept 16, 2021

Gensim is great for generating certain types of embeddings, but not for operationalizing them. It doesn’t do approximate nearest neighbor lookup which is a deal breaker for most models that use embeddings at scale. It also do not manage versioning so you end up having to hack a workflow around it to manage embedding. Finally, it’s not really data infrastructure like this is, so you end up doing hacky things like copying all your embeddings to every docker file. With regards to serving embeddings, gensim is just a library that supports in-memory brute force nearest neighbour look ups.

ymroddi · on Sept 20, 2021

gensim actually allows you to use both annoy and nmslib with gensim generated vectors as part of the api.

https://radimrehurek.com/gensim/similarities/nmslib.html

https://radimrehurek.com/gensim/similarities/annoy.html

cyrusthegreat · on Sept 16, 2021

Thanks for the kind words! We'd love to get your feedback as we iterate. Please join our slack community: https://join.slack.com/t/featureform-community/shared_invite...

cyrusthegreat · on Sept 16, 2021

Not yet, this is very much an early release to get it in people's hands and to get feedback on the API and the functionality. We've purposely held off optimizing too much until we feel more confident that this is useful and our API approach makes sense for people. That said, Simba who's one of the main devs actually comes from a performance tuning background at Google. Also, it's built on HNSWLIB and RocksDB, and is being used in real world workloads today.

cyrusthegreat · on Sept 16, 2021

We use HNSW internally via HNSWLIB, it's the same algorithm that Facebook uses to power their embedding search.

sathergate · on Sept 16, 2021

thanks! how did you make the decision to use hnsw over faiss and other search algorithms?

cyrusthegreat · on Sept 16, 2021

Faiss actually also uses HNSW internally, HNSWLIB is just a lighter weight implementation which allowed us to iterate faster. In the future we will switch it back out for FAISS to take advantage of its full array of functionality.