I'm running on Linux with a 13900k, 64 GB RAM, and I already have CUDA installed...

owehrens · on Dec 13, 2023

Thanks so much again. Got it working. 8 seconds. Nvidia is the king. Updated the blog post.

darkteflon · on Dec 13, 2023

I think insanely-faster-whisper uses batching, so faster-whisper (which doesn’t) might be a fairer comparison for the purposes of your post.

swores · on Dec 13, 2023

Thanks!

Another question that's only slightly related, but while we're here...

Using OAI's paid Whisper API, you can give a text prompt to a) set the tone/style of the transcription and b) teach it technical terms, names etc that it might not be familiar with and should expect in the audio to transcribe.

Am I correct that this isn't possible with any released versions of Whisper, or is there a way to do it on my machine that I'm not aware of?

modeless · on Dec 13, 2023

You can definitely do this with the open source version. Many transcription implementations use it to maintain context between the max-30-second chunks Whisper natively supports.

swores · on Dec 13, 2023

I'll try to understand some of how stuff like faster-whisper works when I've got time over the weekend, but I fear it may be too complex for me...

I was rather hoping for a guide of just how to either adapt classic whisper usage or adapt one of the optimised ones like faster-whisper (which I've just set up in a docker container but that's used up all the time I've got for playing around right now) to take a text prompt with the audio file.

sundvor · on Dec 13, 2023

Cheers, I've been wanting to get into doing something else with my 4090 order than multi monitor simulator gaming, quad screen workstation work - and this will get me kicked off!

The 4090 is an absolute beast, runs extremely quiet and simply powers through everything. DCS pushes it to the limit, but the resulting experience is simply stunning. Mine's coupled to a 7800x3d which uses hardly any power at all, absolutely love it.

modeless · on Dec 13, 2023

If you're looking for something easy to try out, try my early demo that hooks Whisper to an LLM and TTS so you can have a real time speech conversation with your local GPU that feels like talking to a person! It's way faster than ChatGPT: https://apps.microsoft.com/detail/9NC624PBFGB7

pixelpoet · on Dec 14, 2023

This sounds awesome! Will check it out soon

owehrens · on Dec 13, 2023

I just can't get it to work, it errors out with 'NotImplementedError: The model type whisper is not yet supported to be used with BetterTransformer.' Did you happen to run into this problem?

modeless · on Dec 13, 2023

Sorry, I didn't encounter that error. It worked on the first try for me. I have wished many times that the ML community didn't settle on Python for this reason...