Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The only thing Whisper misses is speaker diarization. I'm currently working on a model that uses Whisper + pyannote to transcribe Interviews and also detects who is speaking. It's working but damn it takes so long


Can you not separate into two phases? Speech separation to get source per speaker, and then whisper on each in isolation (maybe interlacing prompts)?



I'm badly looking for that! Is there a repo I can follow?


not GP (hoping he responds tho) but i've been collecting a couple of diarization options: https://github.com/sw-yx/ai-notes/blob/main/AUDIO.md

basically whisper.cpp has some support but its not great (based on my own testing)

- https://huggingface.co/spaces/vumichien/whisper-speaker-diar...

- https://github.com/Majdoddin/nlp pyannote diarization

- whisperX with diarization https://twitter.com/maxhbain/status/1619698716914622466 https://github.com/m-bain/whisperX


I can share my repo when it's finished. In the meantime, you can take a look at this: https://huggingface.co/spaces/vumichien/whisper-speaker-diar...


My goal for my project is to build a tool that transcribes Interviews (e.g, in Sales or Recruiting) and puts the Transcription through ChatGPT (Waiting for the API atm) to make a summary that looks like the notes of the call. Speaker diarization is important, so I don't have more than 4000 tokens input in ChatGPT. I will see how it goes, but if it's reliable enough (looks like it so far), it will save the time it takes to write meeting notes and rewrite them to send them to someone after the call (Hiring Managers etc.) Imagine a 10x Otter.ai or something like that.


Why are you waiting for the API? The OpenAI Playground has API examples you can copy paste. You can go over 4000 tokens if you have a business justification and payment method. You have access to most of their models even the new Codex ones

Edit: Looked at your link and I misunderstood. I think I understand you're waiting for the ChatGPT specific model now?


> You can go over 4000 tokens if you have a business justification and payment method.

That's incorrect


You are correct that I was incorrect. Thank you for correcting me. I misread their documentation. Sounds like they might increase the token limit in the future, but right now it's 4097 tokens shared with the prompt


Ha. I’m also doing something similar with a friend at https://www.paxo.ai. Funny that we all seemed to have an similar idea, all at once.


What did you build the landing page with?


from the source code <!DOCTYPE html><!-- This site was created in Webflow. https://www.webflow.com --><!--


I also started building the same thing. Crazy that something that used to be nearly impossible will soon be a "hello world" type project


Sounds interesting do you have a page


Ok. Our service is pretty fast. Also the M-Macs is really fast imo


whats your training rig like?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: