Ah, I see the article does mention "brute-force-like" — I must have skimmed past...

sdenton4 · 2025-09-26T19:48:45 1758916125

To be clear, I'm not the author of the post. But I do maintain a library for folks working with large audio datasets, built on a combination of SQLite and usearch. :)

dotancohen · 2025-09-27T09:00:23 1758963623

What library is that? My current project is working with voice recordings. My personal collection of voice recordings spans 20 years and measures in the high tens of GiB.

sdenton4 · 2025-09-27T14:28:48 1758983328

Here you go: https://github.com/google-research/perch-hoplite

It's geared towards bioacoustics in particular. It's pretty easy to provide a wrapper for any given model though. Feel free to send a ping if you try it out for speech; will be happy to hear how it goes.

dotancohen · 2025-09-27T19:46:34 1759002394

Interesting. Audio search isn't a problem I've thought about addressing, as I'll have transcriptions anyway. But knowing this exists might inspire some additional features or use cases that I haven't thought of yet. Thank you.

sdenton4 · 2025-09-28T19:24:40 1759087480

Yep, makes sense - conversion to text and then aligning the text with the audio is a very reasonable way to handle large volumes of speech data. For bioacoustics, we tend to have a loooooot of variation for which there is no real notation, and which may be from areas where we haven't seen much training data, or on taxa where we don't have lots of scientists (eg, insects). So working with the raw audio embeddings tends to be best.