Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Subtitles

Reply
 
Thread Tools Search this Thread Display Modes
Old 28th April 2023, 13:00   #1  |  Link
VoodooFX
Banana User
 
VoodooFX's Avatar
 
Join Date: Sep 2008
Posts: 985
Standalone Faster-Whisper - AI auto-transcription-translation

Whisper is a state of the art auto-transcription-translation model - Robust Speech Recognition via Large-Scale Weak Supervision



There are my compiled binaries for newbies: https://github.com/Purfview/whisper-standalone-win

Last edited by VoodooFX; 8th November 2023 at 15:16.
VoodooFX is offline   Reply With Quote
Old 28th April 2023, 15:14   #2  |  Link
StainlessS
HeartlessS Usurer
 
StainlessS's Avatar
 
Join Date: Dec 2009
Location: Over the rainbow
Posts: 10,980
WOW! VX, you be da man.
__________________
I sometimes post sober.
StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace

"Some infinities are bigger than other infinities", but how many of them are infinitely bigger ???

Last edited by StainlessS; 28th April 2023 at 15:17.
StainlessS is offline   Reply With Quote
Old 1st May 2023, 13:09   #3  |  Link
VoodooFX
Banana User
 
VoodooFX's Avatar
 
Join Date: Sep 2008
Posts: 985
@StainlessS How is internet at the pub, did you downloaded a release with GPU?

Last edited by VoodooFX; 1st May 2023 at 13:31.
VoodooFX is offline   Reply With Quote
Old 1st May 2023, 18:03   #4  |  Link
StainlessS
HeartlessS Usurer
 
StainlessS's Avatar
 
Join Date: Dec 2009
Location: Over the rainbow
Posts: 10,980
Not yet, I thought that it seemed a little bit wierd, what with all of the various downloads necessary.
I did not have a clue what to do with stuff on the github site, did not seem to be anything downloadable.
However, I did find some stuff here:- https://github.com/openai/whisper/discussions/63
which would seem to be the model thingies.
Maybe I down them in pub, but in no great hurry at the moment.
I'll also down the GPU thingy.

EDIT: Yes I know it can auto download the models, but I want offline download and want to know where they come from.
__________________
I sometimes post sober.
StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace

"Some infinities are bigger than other infinities", but how many of them are infinitely bigger ???

Last edited by StainlessS; 1st May 2023 at 18:17.
StainlessS is offline   Reply With Quote
Old 1st May 2023, 18:53   #5  |  Link
VoodooFX
Banana User
 
VoodooFX's Avatar
 
Join Date: Sep 2008
Posts: 985
Quote:
Originally Posted by StainlessS View Post
I did not have a clue what to do with stuff on the github site, did not seem to be anything downloadable.
It seemed that you knew where to download:

Quote:
Originally Posted by StainlessS View Post
EDIT: What kind of speed can one expect (I downed the 170MB-ish CPU version, 1.6-ish GB for GPU ver$ is a bit rich for me, is that much faster ?) ?.
Anyway, in GitGub at the right side you should see "Releases" button.

Model[s] are downloaded separately, automatically or manually, link for the models is in the front page of the repo.

You don't want "OpenAI" stuff, it's very slow.
VoodooFX is offline   Reply With Quote
Old 4th May 2023, 05:24   #6  |  Link
StainlessS
HeartlessS Usurer
 
StainlessS's Avatar
 
Join Date: Dec 2009
Location: Over the rainbow
Posts: 10,980
OK, I got it working [with auto download of the model, I downloaded the pytorch models [.pt extension] by mistake, in pub].

I tested with both CPU and GPU versions, GPU pleasantly faster,
01:33:xx movie under GPU/medium.en model took 269 seconds for 933 subtiles.

Thanks for prodding me in this direction.
__________________
I sometimes post sober.
StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace

"Some infinities are bigger than other infinities", but how many of them are infinitely bigger ???

Last edited by StainlessS; 5th May 2023 at 09:07.
StainlessS is offline   Reply With Quote
Old 4th May 2023, 22:30   #7  |  Link
Emulgator
Big Bit Savings Now !
 
Emulgator's Avatar
 
Join Date: Feb 2007
Location: close to the wall
Posts: 1,531
CPU: I get
Code:
2023-05-04 23:28:28.0660903 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:1671 onnxruntime::python::CreateInferencePybindStateModule] Init provider bridge failed.
but after that recognition starts. Aborted manually to see CUDA blow...

CUDA: the same report, then it starts munching...

Still it repeats some lines over and over, but
WOW IS THAT QUICK ! Like 10-fold going from a .wav file.
635s for a 1:43:40 musical movie in English, 1769 subs, songs included.
103 mins playing time transcribed in 10,5min with ~95% accuracy.
Found slang stuff I was helpless to translate before.

CPU 11900K 17%, GPU RTX3080 1% How, well...

Awesome work, many thanks, VoodooFX !
__________________
"To bypass shortcuts and find suffering...is called QUALity" (Die toten Augen von Friedrichshain)
"Data reduction ? Yep, Sir. We're that issue working on. Synce invntoin uf lingöage..."

Last edited by Emulgator; 4th May 2023 at 23:09.
Emulgator is offline   Reply With Quote
Old 4th May 2023, 23:09   #8  |  Link
VoodooFX
Banana User
 
VoodooFX's Avatar
 
Join Date: Sep 2008
Posts: 985
That message from onnxruntime is "normal", expectable nonsense from Microsoft's lib.

Quote:
Originally Posted by Emulgator View Post
Still it repeats some lines over and over
Can you cut and share shorter sample of that audio where it happens? [Test if cut sample still loops]

EDIT:

Quote:
Originally Posted by Emulgator View Post
going from a .wav file
Use original audio, don't convert it to anything, results from ffmpeg conversion are noticeably worse for some reason.

EDIT2:

Btw, some people reported that on CPU it's a bit more accurate.

Last edited by VoodooFX; 4th May 2023 at 23:46.
VoodooFX is offline   Reply With Quote
Old 4th May 2023, 23:36   #9  |  Link
Emulgator
Big Bit Savings Now !
 
Emulgator's Avatar
 
Join Date: Feb 2007
Location: close to the wall
Posts: 1,531
No time to continue right now but I guess it will be the same behaviour as with the other Whisper versions within SubtitleEdit:
If I just restrict and give only a smaller range it works most of the time.
__________________
"To bypass shortcuts and find suffering...is called QUALity" (Die toten Augen von Friedrichshain)
"Data reduction ? Yep, Sir. We're that issue working on. Synce invntoin uf lingöage..."
Emulgator is offline   Reply With Quote
Old 5th May 2023, 08:14   #10  |  Link
Emulgator
Big Bit Savings Now !
 
Emulgator's Avatar
 
Join Date: Feb 2007
Location: close to the wall
Posts: 1,531
Now let's have this engine on Android (I once had to help a hearing-disabled person using their mobile phone for live transcription),
and finally Big Goo's online-listening would be out of the water...
__________________
"To bypass shortcuts and find suffering...is called QUALity" (Die toten Augen von Friedrichshain)
"Data reduction ? Yep, Sir. We're that issue working on. Synce invntoin uf lingöage..."
Emulgator is offline   Reply With Quote
Old 5th May 2023, 09:12   #11  |  Link
StainlessS
HeartlessS Usurer
 
StainlessS's Avatar
 
Join Date: Dec 2009
Location: Over the rainbow
Posts: 10,980
Quote:
Btw, some people reported that on CPU it's a bit more accurate.
My CPU and GPU outputs were definitely different, GPU quite often split longer single CPU sub
into two shorter GPU subs. [I think]

EDIT:
I wonder how it would do on movie SNATCH, Brad Pitt's irish gypsy accent [fantastic job by Pitt to make it totally unintelligible].

Snatch/Pitt: https://www.youtube.com/watch?v=Gfzxz7asbZs
__________________
I sometimes post sober.
StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace

"Some infinities are bigger than other infinities", but how many of them are infinitely bigger ???

Last edited by StainlessS; 5th May 2023 at 09:18.
StainlessS is offline   Reply With Quote
Old 13th May 2023, 06:57   #12  |  Link
VoodooFX
Banana User
 
VoodooFX's Avatar
 
Join Date: Sep 2008
Posts: 985
On English audio I get better results with multilingual "medium" model than with "medium.en".


@Emulgator
@StainlessS
Could you test if new Faster-Whisper r117 runs OK on CUDA? [Executable is small]
And I would be interested in transcription accuracy and speed benchmark vs previous version.

Last edited by VoodooFX; 13th May 2023 at 07:01.
VoodooFX is offline   Reply With Quote
Old 13th May 2023, 18:05   #13  |  Link
StainlessS
HeartlessS Usurer
 
StainlessS's Avatar
 
Join Date: Dec 2009
Location: Over the rainbow
Posts: 10,980
Quote:
[Executable is small]
Yeah, maybe, but requires
cuBLAS 11.x @ 3.4GB, and
cuDNN 8.x @ unknown size (I have an nVidia devs account somewhere, I'll havta find it).
I'm currently on 50GB data per month [EE @ £20/month], I think maybe I'll up it to 130GB/month next month [think next up is 130GB for £30/month].
Anyway, I'll down them in pub next time I'm there.

EDIT: The models were updated 22hours ago.
EDIT: Actually EE 125GB for £30/month for my 4G+ Router. https://shop.ee.co.uk/sim-only/pay-as-you-go-phones#
__________________
I sometimes post sober.
StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace

"Some infinities are bigger than other infinities", but how many of them are infinitely bigger ???

Last edited by StainlessS; 13th May 2023 at 18:14.
StainlessS is offline   Reply With Quote
Old 13th May 2023, 19:56   #14  |  Link
VoodooFX
Banana User
 
VoodooFX's Avatar
 
Join Date: Sep 2008
Posts: 985
Quote:
Originally Posted by StainlessS View Post
Yeah, maybe, but requires
cuBLAS 11.x @ 3.4GB, and
cuDNN 8.x @ unknown size (I have an nVidia devs account somewhere, I'll havta find it).

EDIT: The models were updated 22hours ago.
Maybe it would work out of the box. Shouldn't that be already installed with CUDA drivers/stuff?

EDIT:
Actually all that stuff should be present in previous version, copying dlls to the same folder from there should work.

Last edited by VoodooFX; 15th May 2023 at 17:23.
VoodooFX is offline   Reply With Quote
Old 13th May 2023, 20:02   #15  |  Link
VoodooFX
Banana User
 
VoodooFX's Avatar
 
Join Date: Sep 2008
Posts: 985
Quote:
Originally Posted by StainlessS View Post
I'm currently on 50GB data per month [EE @ £20/month], I think maybe I'll up it to 130GB/month next month [think next up is 130GB for £30/month].
Get GiffGaff PAYG, unlimited* for £25. [GiffGaff is basically O2]

*Full 4G speed till 80GB, after that it limits to 386kb at daytime, but at night it's full speed again.

Last edited by VoodooFX; 13th May 2023 at 20:14.
VoodooFX is offline   Reply With Quote
Old 13th May 2023, 21:13   #16  |  Link
StainlessS
HeartlessS Usurer
 
StainlessS's Avatar
 
Join Date: Dec 2009
Location: Over the rainbow
Posts: 10,980
Thanx VFX, but I'll stick with faster EE, says in my given link that speed is max 25mbps.
EtherNet, USB etc, tend to have a management overhead of 20%, I assume same for 4G,
So to convert mbps to MB/s, just divide by 10.
Despite what EE says (max 25 mbps) I regularly get 3 or 4 MB/s (as for 30 or 40 mbps), and have
once noticed it at 8MB/s during the night.
Thats pretty good speed considering that I'm quite a way from city urban area (I'm near green parkland),
and only get 2 out of 5 bar signal.

EDIT: I presume that we get charged for the management overhead, and that it is included in max 25 mbps.

Quote:
*Full 4G speed till 80GB, after that it limits to 386kb at daytime, but at night it's full speed again.
Did not know that, thanx.
__________________
I sometimes post sober.
StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace

"Some infinities are bigger than other infinities", but how many of them are infinitely bigger ???

Last edited by StainlessS; 13th May 2023 at 21:28.
StainlessS is offline   Reply With Quote
Old 15th May 2023, 18:10   #17  |  Link
VoodooFX
Banana User
 
VoodooFX's Avatar
 
Join Date: Sep 2008
Posts: 985
@StainlessS
I could make a release [or separate download] including Nvidia libs, but I don't know which libs are actually needed, I don't wanna include whole 4GB stuff.
Could someone check which libs are actually needed by copying dlls [to same folder with whisper.exe from "r117"] one by one on error from "b103" release, there libs are located at "Whisper-Faster\torch\lib"?
That would need Windows with only Nividia drivers installed without CUDA Toolkit & cuDNN.
VoodooFX is offline   Reply With Quote
Old 16th May 2023, 16:47   #18  |  Link
Emulgator
Big Bit Savings Now !
 
Emulgator's Avatar
 
Join Date: Feb 2007
Location: close to the wall
Posts: 1,531
If swapped into the same python-laden folder as r103, r117 throws error:
"Could not load cudnn_ops_infer64_8.dll. Error code 126.
Make sure that cudnn_ops_infer64_8.dll is in your path."
That file was indeed in torch\lib, together with 36 more .dlls.
I copied that side by side to 117, then
"Could not load cudnn_cnn_infer64_8.dll. Error code 126.
Make sure that cudnn_cnn_infer64_8.dll is in your path."
I copied it side by side, then still the same fault.

Win10P64, i9-11900K, RTX3080, cudart64_110.dll 6.14.11.11080 in system32.
P.S. This is my replacement system, so no CUDA 11.8 installed yet.
__________________
"To bypass shortcuts and find suffering...is called QUALity" (Die toten Augen von Friedrichshain)
"Data reduction ? Yep, Sir. We're that issue working on. Synce invntoin uf lingöage..."

Last edited by Emulgator; 16th May 2023 at 17:01.
Emulgator is offline   Reply With Quote
Old 16th May 2023, 18:49   #19  |  Link
VoodooFX
Banana User
 
VoodooFX's Avatar
 
Join Date: Sep 2008
Posts: 985
@Emulgator Thanks for testing it.

r117 is standalone single executable, no need to copy it to r103 folder. [It's not some incremental patch]

"cudnn_cnn_infer64_8.dll" has dependency on "zlibwapi.dll", so copy it too.

Last edited by VoodooFX; 16th May 2023 at 18:54.
VoodooFX is offline   Reply With Quote
Old 17th May 2023, 15:17   #20  |  Link
Emulgator
Big Bit Savings Now !
 
Emulgator's Avatar
 
Join Date: Feb 2007
Location: close to the wall
Posts: 1,531
Ah, ok. Continuing.
Removed cudart64_110.dll 6.14.11.11080 from system32,
(still no CUDA 11.8 installed).

Now running from a separate folder, containing only the .exe, the .bat and the _models folder,
and adding dependencies as we speak.

We had
cudnn_ops_infer64_8.dll
cudnn_cnn_infer64_8.dll

You hinted
zlibwapi.dll
I added that.

Now it asked for
cublasLt64_11.dll
I added that.

Then it asked for
cublas64_11.dll
I added that.

Start: Success ! Now it runs with just these 5 dependencies.
14..17..18% on CPU, 3 of 16 cores (1 is 90%, 1 is 66%, 1 is 33%), 1% GPU.
Load distribution looks the same as with r103.

Speed: will tell when it is finished. Feels the same range as with r103.
Then I will compare again r103 vs. r117 apples-to-apples.

Finished: 949s for the same movie. 50%slower.
Was stuck quite a bit at 01:37:40.700 (song beginning)
generating of subtitles ended there at #1757,
so it did not reach the movie's end at 01:43:40

Repeated r103: 633s for the same movie, ran until the end.
At the moment r103 within its full dependency bag looks better.

nVidia Driver on this (older, clone father, now replacement) system SSD: 462.75
__________________
"To bypass shortcuts and find suffering...is called QUALity" (Die toten Augen von Friedrichshain)
"Data reduction ? Yep, Sir. We're that issue working on. Synce invntoin uf lingöage..."

Last edited by Emulgator; 17th May 2023 at 16:24.
Emulgator is offline   Reply With Quote
Reply

Tags
audio, openai, speech, subtitles, text

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 07:33.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.