More

userhacker · on May 14, 2024

Just made https://feycher.com thats similar, but has realtime lip syncing as well. Let me know if you are interested and we can chat

userhacker · on Jan 25, 2024

A new age of empires game or any top down real-time strategy game.

dfex · on Jan 25, 2024

Populous playing out on my coffee table would be awesome. And it would probably be one game where the pinch/tap gestures would work pretty well in lieu of a controller.

userhacker · on Nov 14, 2023

I'm the creator of Revoldiv.com, We do speaker diarization and transcription at the same time. Give it a try.

userhacker · on Aug 23, 2023

Try to upload it on https://revoldiv.com/ we pre-process the file to make it a little Intelligible and you can supply your context when uploading.

userhacker · on Aug 23, 2023

If you want a quick and free web transcription and editor tool, We've built https://revoldiv.com/ with speaker detection and timestamps. Takes less than a minute to transcribe 1 hour long video/audio

beardedwizard · on Aug 23, 2023

Yes but the point of this project is that it doesn't require you to share sensitive data with third parties.

userhacker · on Aug 23, 2023

Good point but the problem with local hosting is that if you want to use the larger models it will take a long time to transcribe a file. We use multiple gpus and we do speaker detection, sound detection and it is has a rich audio editor.

beardedwizard · on Aug 24, 2023

Totally agree, having built a similar app I know speaker diarization is a killer feature that's hard to get. My problem is I'll never share these recordings ;).

userhacker · on March 3, 2023

On the end user side Revoldiv.com lets you pick any podcast you want and transcribe it

userhacker · on March 2, 2023

Nice product! Any integration planned for jetbrain ides?

Buoy · on March 2, 2023

Yes we do in the medium term, we fortunately built the product in a modular/headless way so adding further integrations is easier, although we're strapped for dev resource currently so once that problem is alleviated then we can start looking at supporting more IDEs!

naiv · on March 2, 2023

If it is headless , I assume that their is an API that you could maybe release to the public so companies could build their own IDE plugins or even integrate it further into their workflow like adding code fractions automatically to tickets etc.

Buoy · on March 3, 2023

Yes we’ve definitely considered this as an option, we’ll hopefully be able to explore it more when we have more dev resource in the next few months (just my cofounder and I working on the tech currently). I think it makes a lot of sense to allow people to make their own integrations!

userhacker · on March 1, 2023

I suggest you give revoldiv.com a try, We use whisper and other models together. You can upload very large files and get an hour long file transcription in less than 30 seconds. We use intelligent chunking so that the model doesn't lose context. We are looking to increase the limit even more in the coming weeks. It's also free to transcribe any video/audio with word level timestamps.

BasilPH · on March 1, 2023

I just gave it a try, and the results are impressive! Do you also offer an API?

graderjs · on March 2, 2023

If you're interested in an offline / local solution: I made a Mac App that uses Whisper.cpp and Voice Activity Detection to skip silence and reduce Whisper hallucinations: https://apps.apple.com/app/wisprnote/id1671480366

If it really works for you, I can add command line params to an upate, so you can use it as a "local API" for free.

userhacker · on March 1, 2023

contact us at team@revoldiv.com and we are offering an API on a case by case basis

userhacker · on Dec 7, 2022

For revoldiv.com we have profiled, many gpus, the best one is 4090. We do a lot of intelligent chunking and detect word boundaries and run the model in parallel in multiple gpus and we get about 40 to 50 seconds for an hour long audio but without expect 7 minutes for an hour long audio on tesla t4

  on tesla-t4-30gb-memory-8vcpu google cloud
   on tiny and tiny.en
    for 10 minute = 30 seconds
   on medium
    for 10 minute = 1m 30s
    for 60 minute = 7m
   on large
    for 60 miutes = 13m
  on NVIDIA GeForce RTX 4090
   on tiny
    for 10-minute = 5.5 seconds
    for 60-minute = 35 seconds
   on base
    for 10-minute = 7 seconds
    for 60-minute = 50 seconds
   on small
    for 10-minute = 14 seconds
    for 60-minute = 1 min 35 sec
   on medium
    for 10-minute = 26 seconds
    for 60-minute = 3 mins
   on large
    for 10-minute = 40 seconds
    for 60-minute = 3 min 54 sec

getcrunk · on Dec 7, 2022

thats crazy. 35 seconds for 60 min on base with a 4090. wow! thanks for the info! also btw i mentioned on another thread i was getting an error. but are you planning on offering this as a paid api?

userhacker · on Dec 7, 2022

Can you send me the audio that caused it, you can email me at team AT revoldiv .com. If there is going to be a lot of interest, yes we can provide it as an api service. Our service has some niceties like word level timestamp, paragraph separation, sound detection etc... for now it is a free service you can use as much as you want

userhacker · on Dec 7, 2022

I recently swapped out the AI model for voice transcription on revoldiv.com and replaced it with Whisper. The results have been truly impressive - even the smaller models outperform and generalize better than any other options on the market. If you want to give it a try, our model is capable of faster transcription by utilizing multiple GPUs and some other enhancements, and it is all free

thundergolfer · on Dec 7, 2022

Your app is very similar to our demo app! https://modal-labs-whisper-pod-transcriber-fastapi-app.modal...

How come you don't support audio files longer than 1hr? Is it because of $$ cost?

The above demo app gets faster transcription by chunking audio and parellelizing over dozens of CPUs, so you can transcribe a about 1hr of audio for $0.10.

userhacker · on Dec 7, 2022

> https://modal-labs-whisper-pod-transcriber-fastapi-app.modal...

Interesting, which model are you using? We use the medium model which is the sweet spot between time/performance ratio. We also chunk, We try to detect words and silences to do better chunking at word boundaries but if you do more chunking and you don't get the word boundaries right it seems like whisper loses some context and the accuracy suffers. We will soon support longer hours. We just want to make sure the wait time for transcription doesn't suffer for most users. But great demo, reach out to me if you want to collaborate

thundergolfer · on Dec 7, 2022

We’re using base. The code is open source at modal-labs/model-examples repo if you want to see anything we’re doing

71a54xd · on Dec 7, 2022

I'd love to try it out, who should I ping to run this locally on my GPU server?

userhacker · on Dec 7, 2022

It's not a model you can run on your own server but a free service on revoldiv.com. You can expect 40 to 50 second wait time to transcribe an hour long video/audio. We combine whisper with our model to get word level timestamps, paragraph separation and sound detections like laughter, music etc... We recently added very basic podcast search and transcription.

getcrunk · on Dec 7, 2022

Im getting an unknown error, errro code 551