Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
|
![]() |
|
Thread Tools | Search this Thread | Display Modes |
|
![]() |
#1 | Link |
Banana User
Join Date: Sep 2008
Posts: 1,116
|
Standalone Faster-Whisper - AI auto-transcription-translation
Whisper is a state of the art auto-transcription-translation model - Robust Speech Recognition via Large-Scale Weak Supervision
![]() There are my compiled binaries for newbies: https://github.com/Purfview/whisper-standalone-win
__________________
InpaintDelogo, DoomDelogo, JerkyWEB Fixer, Standalone Faster-Whisper - AI subtitling Last edited by VoodooFX; 8th November 2023 at 15:16. |
![]() |
![]() |
![]() |
#2 | Link |
HeartlessS Usurer
Join Date: Dec 2009
Location: Over the rainbow
Posts: 11,108
|
WOW! VX, you be da man.
__________________
I sometimes post sober. StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace "Some infinities are bigger than other infinities", but how many of them are infinitely bigger ??? Last edited by StainlessS; 28th April 2023 at 15:17. |
![]() |
![]() |
![]() |
#3 | Link |
Banana User
Join Date: Sep 2008
Posts: 1,116
|
@StainlessS How is internet at the pub, did you downloaded a release with GPU?
__________________
InpaintDelogo, DoomDelogo, JerkyWEB Fixer, Standalone Faster-Whisper - AI subtitling Last edited by VoodooFX; 1st May 2023 at 13:31. |
![]() |
![]() |
![]() |
#4 | Link |
HeartlessS Usurer
Join Date: Dec 2009
Location: Over the rainbow
Posts: 11,108
|
Not yet, I thought that it seemed a little bit wierd, what with all of the various downloads necessary.
I did not have a clue what to do with stuff on the github site, did not seem to be anything downloadable. However, I did find some stuff here:- https://github.com/openai/whisper/discussions/63 which would seem to be the model thingies. Maybe I down them in pub, but in no great hurry at the moment. I'll also down the GPU thingy. EDIT: Yes I know it can auto download the models, but I want offline download and want to know where they come from.
__________________
I sometimes post sober. StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace "Some infinities are bigger than other infinities", but how many of them are infinitely bigger ??? Last edited by StainlessS; 1st May 2023 at 18:17. |
![]() |
![]() |
![]() |
#5 | Link | ||
Banana User
Join Date: Sep 2008
Posts: 1,116
|
Quote:
Quote:
Model[s] are downloaded separately, automatically or manually, link for the models is in the front page of the repo. You don't want "OpenAI" stuff, it's very slow.
__________________
InpaintDelogo, DoomDelogo, JerkyWEB Fixer, Standalone Faster-Whisper - AI subtitling |
||
![]() |
![]() |
![]() |
#6 | Link |
HeartlessS Usurer
Join Date: Dec 2009
Location: Over the rainbow
Posts: 11,108
|
OK, I got it working [with auto download of the model, I downloaded the pytorch models [.pt extension] by mistake, in pub].
I tested with both CPU and GPU versions, GPU pleasantly faster, 01:33:xx movie under GPU/medium.en model took 269 seconds for 933 subtiles. Thanks for prodding me in this direction.
__________________
I sometimes post sober. StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace "Some infinities are bigger than other infinities", but how many of them are infinitely bigger ??? Last edited by StainlessS; 5th May 2023 at 09:07. |
![]() |
![]() |
![]() |
#7 | Link |
Big Bit Savings Now !
Join Date: Feb 2007
Location: close to the wall
Posts: 1,888
|
CPU: I get
Code:
2023-05-04 23:28:28.0660903 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:1671 onnxruntime::python::CreateInferencePybindStateModule] Init provider bridge failed. CUDA: the same report, then it starts munching... Still it repeats some lines over and over, but WOW IS THAT QUICK ! Like 10-fold going from a .wav file. 635s for a 1:43:40 musical movie in English, 1769 subs, songs included. 103 mins playing time transcribed in 10,5min with ~95% accuracy. Found slang stuff I was helpless to translate before. CPU 11900K 17%, GPU RTX3080 1% How, well... Awesome work, many thanks, VoodooFX !
__________________
"To bypass shortcuts and find suffering...is called QUALity" (Die toten Augen von Friedrichshain) "Data reduction ? Yep, Sir. We're that issue working on. Synce invntoin uf lingöage..." Last edited by Emulgator; 4th May 2023 at 23:09. |
![]() |
![]() |
![]() |
#8 | Link |
Banana User
Join Date: Sep 2008
Posts: 1,116
|
That message from onnxruntime is "normal", expectable nonsense from Microsoft's lib.
![]() Can you cut and share shorter sample of that audio where it happens? [Test if cut sample still loops] EDIT: Use original audio, don't convert it to anything, results from ffmpeg conversion are noticeably worse for some reason. EDIT2: Btw, some people reported that on CPU it's a bit more accurate.
__________________
InpaintDelogo, DoomDelogo, JerkyWEB Fixer, Standalone Faster-Whisper - AI subtitling Last edited by VoodooFX; 4th May 2023 at 23:46. |
![]() |
![]() |
![]() |
#9 | Link |
Big Bit Savings Now !
Join Date: Feb 2007
Location: close to the wall
Posts: 1,888
|
No time to continue right now but I guess it will be the same behaviour as with the other Whisper versions within SubtitleEdit:
If I just restrict and give only a smaller range it works most of the time.
__________________
"To bypass shortcuts and find suffering...is called QUALity" (Die toten Augen von Friedrichshain) "Data reduction ? Yep, Sir. We're that issue working on. Synce invntoin uf lingöage..." |
![]() |
![]() |
![]() |
#10 | Link |
Big Bit Savings Now !
Join Date: Feb 2007
Location: close to the wall
Posts: 1,888
|
Now let's have this engine on Android (I once had to help a hearing-disabled person using their mobile phone for live transcription),
and finally Big Goo's online-listening would be out of the water...
__________________
"To bypass shortcuts and find suffering...is called QUALity" (Die toten Augen von Friedrichshain) "Data reduction ? Yep, Sir. We're that issue working on. Synce invntoin uf lingöage..." |
![]() |
![]() |
![]() |
#11 | Link | |
HeartlessS Usurer
Join Date: Dec 2009
Location: Over the rainbow
Posts: 11,108
|
Quote:
into two shorter GPU subs. [I think] EDIT: I wonder how it would do on movie SNATCH, Brad Pitt's irish gypsy accent [fantastic job by Pitt to make it totally unintelligible]. Snatch/Pitt: https://www.youtube.com/watch?v=Gfzxz7asbZs
__________________
I sometimes post sober. StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace "Some infinities are bigger than other infinities", but how many of them are infinitely bigger ??? Last edited by StainlessS; 5th May 2023 at 09:18. |
|
![]() |
![]() |
![]() |
#12 | Link |
Banana User
Join Date: Sep 2008
Posts: 1,116
|
On English audio I get better results with multilingual "medium" model than with "medium.en".
![]() @Emulgator @StainlessS Could you test if new Faster-Whisper r117 runs OK on CUDA? [Executable is small] And I would be interested in transcription accuracy and speed benchmark vs previous version.
__________________
InpaintDelogo, DoomDelogo, JerkyWEB Fixer, Standalone Faster-Whisper - AI subtitling Last edited by VoodooFX; 13th May 2023 at 07:01. |
![]() |
![]() |
![]() |
#13 | Link | |
HeartlessS Usurer
Join Date: Dec 2009
Location: Over the rainbow
Posts: 11,108
|
Quote:
cuBLAS 11.x @ 3.4GB, and cuDNN 8.x @ unknown size (I have an nVidia devs account somewhere, I'll havta find it). I'm currently on 50GB data per month [EE @ £20/month], I think maybe I'll up it to 130GB/month next month [think next up is 130GB for £30/month]. Anyway, I'll down them in pub next time I'm there. EDIT: The models were updated 22hours ago. EDIT: Actually EE 125GB for £30/month for my 4G+ Router. https://shop.ee.co.uk/sim-only/pay-as-you-go-phones#
__________________
I sometimes post sober. StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace "Some infinities are bigger than other infinities", but how many of them are infinitely bigger ??? Last edited by StainlessS; 13th May 2023 at 18:14. |
|
![]() |
![]() |
![]() |
#14 | Link | |
Banana User
Join Date: Sep 2008
Posts: 1,116
|
Quote:
EDIT: Actually all that stuff should be present in previous version, copying dlls to the same folder from there should work.
__________________
InpaintDelogo, DoomDelogo, JerkyWEB Fixer, Standalone Faster-Whisper - AI subtitling Last edited by VoodooFX; 15th May 2023 at 17:23. |
|
![]() |
![]() |
![]() |
#15 | Link | |
Banana User
Join Date: Sep 2008
Posts: 1,116
|
Quote:
*Full 4G speed till 80GB, after that it limits to 386kb at daytime, but at night it's full speed again.
__________________
InpaintDelogo, DoomDelogo, JerkyWEB Fixer, Standalone Faster-Whisper - AI subtitling Last edited by VoodooFX; 13th May 2023 at 20:14. |
|
![]() |
![]() |
![]() |
#16 | Link | |
HeartlessS Usurer
Join Date: Dec 2009
Location: Over the rainbow
Posts: 11,108
|
Thanx VFX, but I'll stick with faster EE, says in my given link that speed is max 25mbps.
EtherNet, USB etc, tend to have a management overhead of 20%, I assume same for 4G, So to convert mbps to MB/s, just divide by 10. Despite what EE says (max 25 mbps) I regularly get 3 or 4 MB/s (as for 30 or 40 mbps), and have once noticed it at 8MB/s during the night. Thats pretty good speed considering that I'm quite a way from city urban area (I'm near green parkland), and only get 2 out of 5 bar signal. EDIT: I presume that we get charged for the management overhead, and that it is included in max 25 mbps. Quote:
__________________
I sometimes post sober. StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace "Some infinities are bigger than other infinities", but how many of them are infinitely bigger ??? Last edited by StainlessS; 13th May 2023 at 21:28. |
|
![]() |
![]() |
![]() |
#17 | Link |
Banana User
Join Date: Sep 2008
Posts: 1,116
|
@StainlessS
I could make a release [or separate download] including Nvidia libs, but I don't know which libs are actually needed, I don't wanna include whole 4GB stuff. Could someone check which libs are actually needed by copying dlls [to same folder with whisper.exe from "r117"] one by one on error from "b103" release, there libs are located at "Whisper-Faster\torch\lib"? That would need Windows with only Nividia drivers installed without CUDA Toolkit & cuDNN.
__________________
InpaintDelogo, DoomDelogo, JerkyWEB Fixer, Standalone Faster-Whisper - AI subtitling |
![]() |
![]() |
![]() |
#18 | Link |
Big Bit Savings Now !
Join Date: Feb 2007
Location: close to the wall
Posts: 1,888
|
If swapped into the same python-laden folder as r103, r117 throws error:
"Could not load cudnn_ops_infer64_8.dll. Error code 126. Make sure that cudnn_ops_infer64_8.dll is in your path." That file was indeed in torch\lib, together with 36 more .dlls. I copied that side by side to 117, then "Could not load cudnn_cnn_infer64_8.dll. Error code 126. Make sure that cudnn_cnn_infer64_8.dll is in your path." I copied it side by side, then still the same fault. Win10P64, i9-11900K, RTX3080, cudart64_110.dll 6.14.11.11080 in system32. P.S. This is my replacement system, so no CUDA 11.8 installed yet.
__________________
"To bypass shortcuts and find suffering...is called QUALity" (Die toten Augen von Friedrichshain) "Data reduction ? Yep, Sir. We're that issue working on. Synce invntoin uf lingöage..." Last edited by Emulgator; 16th May 2023 at 17:01. |
![]() |
![]() |
![]() |
#19 | Link |
Banana User
Join Date: Sep 2008
Posts: 1,116
|
@Emulgator Thanks for testing it.
r117 is standalone single executable, no need to copy it to r103 folder. [It's not some incremental patch] "cudnn_cnn_infer64_8.dll" has dependency on "zlibwapi.dll", so copy it too.
__________________
InpaintDelogo, DoomDelogo, JerkyWEB Fixer, Standalone Faster-Whisper - AI subtitling Last edited by VoodooFX; 16th May 2023 at 18:54. |
![]() |
![]() |
![]() |
#20 | Link |
Big Bit Savings Now !
Join Date: Feb 2007
Location: close to the wall
Posts: 1,888
|
Ah, ok. Continuing.
Removed cudart64_110.dll 6.14.11.11080 from system32, (still no CUDA 11.8 installed). Now running from a separate folder, containing only the .exe, the .bat and the _models folder, and adding dependencies as we speak. We had cudnn_ops_infer64_8.dll cudnn_cnn_infer64_8.dll You hinted zlibwapi.dll I added that. Now it asked for cublasLt64_11.dll I added that. Then it asked for cublas64_11.dll I added that. Start: Success ! Now it runs with just these 5 dependencies. 14..17..18% on CPU, 3 of 16 cores (1 is 90%, 1 is 66%, 1 is 33%), 1% GPU. Load distribution looks the same as with r103. Speed: will tell when it is finished. Feels the same range as with r103. Then I will compare again r103 vs. r117 apples-to-apples. Finished: 949s for the same movie. 50%slower. Was stuck quite a bit at 01:37:40.700 (song beginning) generating of subtitles ended there at #1757, so it did not reach the movie's end at 01:43:40 Repeated r103: 633s for the same movie, ran until the end. At the moment r103 within its full dependency bag looks better. nVidia Driver on this (older, clone father, now replacement) system SSD: 462.75
__________________
"To bypass shortcuts and find suffering...is called QUALity" (Die toten Augen von Friedrichshain) "Data reduction ? Yep, Sir. We're that issue working on. Synce invntoin uf lingöage..." Last edited by Emulgator; 17th May 2023 at 16:24. |
![]() |
![]() |
![]() |
Tags |
audio, openai, speech, subtitles, text |
Thread Tools | Search this Thread |
Display Modes | |
|
|