Hacker Newsnew | past | comments | ask | show | jobs | submit | aksx's commentslogin

What do you use to index image vectors? Does a "retrieval based modal" not require an index?


The "retrieval based model" refers to https://github.com/CompVis/latent-diffusion#retrieval-augmen..., which uses ScaNN to train a knn embedding searcher.


Maybe because other people search in different places. Events/Businesses: Google Maps, Instagram Trend/Tutorials: Tiktok, IG Reels News: Twitter/Social Media/News Outlet of choice

Google is for searching new sources of information and either it’s instant (google info box) or takes too long which makes a lasting memory about it being painful.



I have heard it call that before by teammates but thought it was just a guess. Seems it’s official

Thanks for sharing :)


So many interesting findings in the post.

The tantivy binary search & rust binary search only has one different at the end, knowing the size of the slice ahead of time, right?

Could the rust compiler infer the size ahead of time?


With inlining maybe?


Even if your current employer let's you talk about specific numbers, without context they don't tell the recruiter anything.

"The API latency was 10 seconds, now it's 8.5 seconds."

or

"I reduced the p99.9 latency from 1 sec to 400ms" (by removing input validation)


If the best thing you did was take an API from 10 to 8.5 it's probably best to tell the recruiter as little as possible.

Your second line is improvement for sure. But it follows the same basic advice to give numbers


So costco pays a small companies to sell their product and extract sales data.

Should amazon start paying small companies at the same margin that costco does?


i suggested building this at a bigco where i work last year, built a prototype using postgres and wrote the design doc.

sadly it wasn't picked up.

Congrats on product, there is a big market for it


Could you tell us a little more about your experience, if that's acceptable? Maybe in personal messages, FB/linked?


zoekt is absolutely awesome, I was reading it’s code yesterday to figure out how it does ngram indexing and search.


Capitalism is the way a system set of connections between subsystems and politics is a way to reorganize a system from within itself.

> can the system still be simplified enough to grasp it well enough to change it?

That would be a model of a system not the system itself, and as George box said "All models are wrong, but some are useful"

> where would an individual who has grasped it find the power to change anything?

If an individual is able to change connections between subsystems, they can change the behavior of the higher level system. But the impact of the change is dependent on the systems being changed.


> That would be a model of a system not the system itself

also worth keeping in mind that a more cybernetic model is totally fine with complex black boxes within the model. This way sometimes you can find truth going from big to small.


I am all for the Go idiom that variable name length should be directly proportional to it's scope, the code samples in the post have take it a bit too far and make the code harder to read.

it would be so much more readable if the struct had more descriptive names

    type Cuckoo struct {
       buckets           []bucket
       numBuckets        uint
       entriesPerBucket  uint 
       fingerprintLength uint
       capacity          uint
    }


It's a good point - and I wasn't entirely sure what to go with.

In the end, I went with the shorter variable names as they are used in the paper I've linked: https://www.pdl.cmu.edu/PDL-FTP/FS/cuckoo-conext2014.pdf

In a production version of this, longer variable names would be a good thing. :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: