Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Subtitles

Reply
 
Thread Tools Search this Thread Display Modes
Old 7th January 2019, 22:12   #761  |  Link
FLX90
Registered User
 
Join Date: Nov 2018
Posts: 33
Currently I'm using Tesseract 4.00 with engine mode Tesseract + LSTM.

Unfortunately I have to do massive post work.
'I' is sometimes a 'L' ,'l' or '!'.
Music symbol isn't recognized right (it's a P or D).
And I don't know what I have overlooked.

Is there anything I could improve in my settings?

EDIT:
I'm using binary image compare now and creating new database for every BluRay.
Works.

Last edited by FLX90; 7th January 2019 at 23:53.
FLX90 is offline   Reply With Quote
Old 14th January 2019, 02:04   #762  |  Link
johner23
Registered User
 
Join Date: Jul 2003
Location: Brazil
Posts: 233
Hello, dear all.

I got the newest version from Subtitle Edit and try to convert a subtitle format to other.

The original subtitle is *.vtt fomat. The output is to be *.srt format.

But after the convertion the subtitle in srt show some strange errors.

Eg:

Quote:
00:01:04,562 --> 00:01:08,908
{\an3}>> O que mais lhe
interessa na história?
The symbols
Quote:
{\an3}
and
Quote:
>>
are not correct.

Do you know how to fix that?

I can make manual correction after the convertion to srt, but I would prefer that the program could do it at the first attempt, without any "errors" or strange "symbols" after the task.

Thanks for your time.

Best regards.

Last edited by johner23; 14th January 2019 at 02:07.
johner23 is offline   Reply With Quote
Old 27th January 2019, 19:12   #763  |  Link
Hendi
Registered User
 
Join Date: Jan 2019
Posts: 1
Only first text is converted on batch mode

Hi

If I convert from sub to srt on batch-mode (/convert) the resulting srt file contains all timeframes but only the first timeframe contains a text. If I do this with the GUI, the whole text is converted and all timeframes contains text.

What do I wrong?

Thanks

Hendi
Hendi is offline   Reply With Quote
Old 31st January 2019, 07:57   #764  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 216
@Hendi: That's a bug, sorry - should be fixed in latest beta: https://github.com/SubtitleEdit/subt...leEditBeta.zip
Next final version should be out soon...
Nikse555 is offline   Reply With Quote
Old 3rd February 2019, 01:55   #765  |  Link
asarian
Registered User
 
Join Date: May 2005
Posts: 1,312
Quote:
Originally Posted by Nikse555 View Post
Subtitle Edit 3.5 is now out

(...)

but SE can also [B]import and ocr vobsub and blu-ray image based subtitles (even from matroska/mp4 files), and DVB sub from .ts files
Subtitle Edit (the latest) actually immediately crashes for me when I import a .sup file ("Object reference not set to an instance of an object." and many other fatal errors).

Btw, why OCR-ing?! That's so 1985!
__________________
Gorgeous, delicious, deculture!
asarian is offline   Reply With Quote
Old 4th February 2019, 13:34   #766  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 216
Quote:
Originally Posted by asarian View Post
Subtitle Edit (the latest) actually immediately crashes for me when I import a .sup file ("Object reference not set to an instance of an object." and many other fatal errors).
Could you try latest beta?
https://github.com/SubtitleEdit/subt...leEditBeta.zip

SE 3.5.9 should be out soon...
Nikse555 is offline   Reply With Quote
Old 10th February 2019, 08:20   #767  |  Link
dngnt
Registered User
 
Join Date: Mar 2016
Posts: 3
Quote:
Originally Posted by Nikse555 View Post
SE 3.5.9 should be out soon...
I'm using SE to convert from bitmap Chinese/Japanese/Korean subs (sup, idx/sub, DVDsub).
There is a possibility to save one sub picture at a time (with right click) while importing , but could you add a function of saving all the sub pictures during OCR?

For example, just a new option in the "export" (during OCR): export all the pictures as *.png? (just the images, no xml because it would be more complicated with "dirty" idx/subs (i.e with errors).)

It's because after finishing the OCR and saving the OCRed srt, I have to correct the (many) mistakes "offline" , but there is no way to compare the OCRed text to the original pictures.

Thanks for your great work!

Last edited by dngnt; 10th February 2019 at 09:11.
dngnt is offline   Reply With Quote
Old 10th February 2019, 13:40   #768  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 216
@dngnt: In the OCR window you can right click in the list view... and choose "Save all images with HTML index..." or "Export -> BDN xml/png"
Nikse555 is offline   Reply With Quote
Old 10th February 2019, 15:37   #769  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 216
And SE 3.5.9 is out: https://github.com/SubtitleEdit/subtitleedit/releases

Some of the changes (mostly related to OCR) listed below:

* NEW:
* Bookmarks - thx OmrSi/marb99
* Image export - option to have single lines top justified - thx joedmartin
* IMPROVED:
* Improve Binary OCR of comma / apostrophe - thx Tuukka
* Improve quote/italic detection in binary OCR - thx Miggu
* Add context menu to OCR spell check
* Binary OCR auto detect best DB - thx Mr. Rage
* FIXED:
* Fix missing/bad html tags after "Auto br" - thx iromafia111
* Fix crash in OCR window when closing - thx spetragl
* Fix OCR in batch convert - thx danstraughn
* Fix crash parsing empty word in OCR via Tesseract - thx Barry
Nikse555 is offline   Reply With Quote
Old 10th February 2019, 18:58   #770  |  Link
varekai
Registered User
 
varekai's Avatar
 
Join Date: Jul 2006
Posts: 409
@ Nikse555

Thanks for the update, much appreciated!
Looking in the portable folders I see Tesseract302.
Is that version also included in SubtitleEdit-3.5.9-Setup.exe?

Kind regards
varekai is offline   Reply With Quote
Old 10th February 2019, 19:41   #771  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 216
@varekai: Yes, Tesseract302 is included in the installer version as well.
Tesseract 4 is available as an in-program download, but note that T4 does not support italic detection and is a lot slower, but T4 is available in more languages and may work better for small unclear fonts.
Nikse555 is offline   Reply With Quote
Old 10th February 2019, 20:57   #772  |  Link
varekai
Registered User
 
varekai's Avatar
 
Join Date: Jul 2006
Posts: 409
@ Nikse555

Great! I use italics and only swedish and english languages so Tesseract302 works perfectly for my needs! Thanks!
varekai is offline   Reply With Quote
Old 11th February 2019, 22:16   #773  |  Link
dngnt
Registered User
 
Join Date: Mar 2016
Posts: 3
Quote:
Originally Posted by Nikse555 View Post
@dngnt: In the OCR window you can right click in the list view... and choose "Save all images with HTML index..." or "Export -> BDN xml/png"
Thanks for your indications!
Subtitle Edit is the best for its import/export versatility for srt, and with these 2 export abilities for sup,idx/sub and DVDsub, it's truly the Swiss knife for subtitling!
dngnt is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 17:31.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.