Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Subtitles

Reply
 
Thread Tools Search this Thread Display Modes
Old 7th January 2019, 21:12   #761  |  Link
FLX90
Registered User
 
Join Date: Nov 2018
Posts: 33
Currently I'm using Tesseract 4.00 with engine mode Tesseract + LSTM.

Unfortunately I have to do massive post work.
'I' is sometimes a 'L' ,'l' or '!'.
Music symbol isn't recognized right (it's a P or D).
And I don't know what I have overlooked.

Is there anything I could improve in my settings?

EDIT:
I'm using binary image compare now and creating new database for every BluRay.
Works.

Last edited by FLX90; 7th January 2019 at 22:53.
FLX90 is offline   Reply With Quote
Old 14th January 2019, 01:04   #762  |  Link
johner23
Registered User
 
Join Date: Jul 2003
Location: Brazil
Posts: 234
Hello, dear all.

I got the newest version from Subtitle Edit and try to convert a subtitle format to other.

The original subtitle is *.vtt fomat. The output is to be *.srt format.

But after the convertion the subtitle in srt show some strange errors.

Eg:

Quote:
00:01:04,562 --> 00:01:08,908
{\an3}>> O que mais lhe
interessa na história?
The symbols
Quote:
{\an3}
and
Quote:
>>
are not correct.

Do you know how to fix that?

I can make manual correction after the convertion to srt, but I would prefer that the program could do it at the first attempt, without any "errors" or strange "symbols" after the task.

Thanks for your time.

Best regards.

Last edited by johner23; 14th January 2019 at 01:07.
johner23 is offline   Reply With Quote
Old 27th January 2019, 18:12   #763  |  Link
Hendi
Registered User
 
Join Date: Jan 2019
Posts: 1
Only first text is converted on batch mode

Hi

If I convert from sub to srt on batch-mode (/convert) the resulting srt file contains all timeframes but only the first timeframe contains a text. If I do this with the GUI, the whole text is converted and all timeframes contains text.

What do I wrong?

Thanks

Hendi
Hendi is offline   Reply With Quote
Old 31st January 2019, 06:57   #764  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 221
@Hendi: That's a bug, sorry - should be fixed in latest beta: https://github.com/SubtitleEdit/subt...leEditBeta.zip
Next final version should be out soon...
Nikse555 is offline   Reply With Quote
Old 3rd February 2019, 00:55   #765  |  Link
asarian
Registered User
 
Join Date: May 2005
Posts: 1,372
Quote:
Originally Posted by Nikse555 View Post
Subtitle Edit 3.5 is now out

(...)

but SE can also [B]import and ocr vobsub and blu-ray image based subtitles (even from matroska/mp4 files), and DVB sub from .ts files
Subtitle Edit (the latest) actually immediately crashes for me when I import a .sup file ("Object reference not set to an instance of an object." and many other fatal errors).

Btw, why OCR-ing?! That's so 1985!
__________________
Gorgeous, delicious, deculture!
asarian is offline   Reply With Quote
Old 4th February 2019, 12:34   #766  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 221
Quote:
Originally Posted by asarian View Post
Subtitle Edit (the latest) actually immediately crashes for me when I import a .sup file ("Object reference not set to an instance of an object." and many other fatal errors).
Could you try latest beta?
https://github.com/SubtitleEdit/subt...leEditBeta.zip

SE 3.5.9 should be out soon...
Nikse555 is offline   Reply With Quote
Old 10th February 2019, 07:20   #767  |  Link
dngnt
Registered User
 
Join Date: Mar 2016
Posts: 3
Quote:
Originally Posted by Nikse555 View Post
SE 3.5.9 should be out soon...
I'm using SE to convert from bitmap Chinese/Japanese/Korean subs (sup, idx/sub, DVDsub).
There is a possibility to save one sub picture at a time (with right click) while importing , but could you add a function of saving all the sub pictures during OCR?

For example, just a new option in the "export" (during OCR): export all the pictures as *.png? (just the images, no xml because it would be more complicated with "dirty" idx/subs (i.e with errors).)

It's because after finishing the OCR and saving the OCRed srt, I have to correct the (many) mistakes "offline" , but there is no way to compare the OCRed text to the original pictures.

Thanks for your great work!

Last edited by dngnt; 10th February 2019 at 08:11.
dngnt is offline   Reply With Quote
Old 10th February 2019, 12:40   #768  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 221
@dngnt: In the OCR window you can right click in the list view... and choose "Save all images with HTML index..." or "Export -> BDN xml/png"
Nikse555 is offline   Reply With Quote
Old 10th February 2019, 14:37   #769  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 221
And SE 3.5.9 is out: https://github.com/SubtitleEdit/subtitleedit/releases

Some of the changes (mostly related to OCR) listed below:

* NEW:
* Bookmarks - thx OmrSi/marb99
* Image export - option to have single lines top justified - thx joedmartin
* IMPROVED:
* Improve Binary OCR of comma / apostrophe - thx Tuukka
* Improve quote/italic detection in binary OCR - thx Miggu
* Add context menu to OCR spell check
* Binary OCR auto detect best DB - thx Mr. Rage
* FIXED:
* Fix missing/bad html tags after "Auto br" - thx iromafia111
* Fix crash in OCR window when closing - thx spetragl
* Fix OCR in batch convert - thx danstraughn
* Fix crash parsing empty word in OCR via Tesseract - thx Barry
Nikse555 is offline   Reply With Quote
Old 10th February 2019, 17:58   #770  |  Link
varekai
Registered User
 
varekai's Avatar
 
Join Date: Jul 2006
Posts: 411
@ Nikse555

Thanks for the update, much appreciated!
Looking in the portable folders I see Tesseract302.
Is that version also included in SubtitleEdit-3.5.9-Setup.exe?

Kind regards
varekai is offline   Reply With Quote
Old 10th February 2019, 18:41   #771  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 221
@varekai: Yes, Tesseract302 is included in the installer version as well.
Tesseract 4 is available as an in-program download, but note that T4 does not support italic detection and is a lot slower, but T4 is available in more languages and may work better for small unclear fonts.
Nikse555 is offline   Reply With Quote
Old 10th February 2019, 19:57   #772  |  Link
varekai
Registered User
 
varekai's Avatar
 
Join Date: Jul 2006
Posts: 411
@ Nikse555

Great! I use italics and only swedish and english languages so Tesseract302 works perfectly for my needs! Thanks!
varekai is offline   Reply With Quote
Old 11th February 2019, 21:16   #773  |  Link
dngnt
Registered User
 
Join Date: Mar 2016
Posts: 3
Quote:
Originally Posted by Nikse555 View Post
@dngnt: In the OCR window you can right click in the list view... and choose "Save all images with HTML index..." or "Export -> BDN xml/png"
Thanks for your indications!
Subtitle Edit is the best for its import/export versatility for srt, and with these 2 export abilities for sup,idx/sub and DVDsub, it's truly the Swiss knife for subtitling!
dngnt is offline   Reply With Quote
Old 25th February 2019, 22:34   #774  |  Link
johner23
Registered User
 
Join Date: Jul 2003
Location: Brazil
Posts: 234
Hi, dear all.

@Nikse555

Hi, Nikse.

When I tested some files *.vtt I still get some errors. Different ones, I mean. Please, can you check it?

---> https://github.com/SubtitleEdit/subt...it/issues/3290

---> https://github.com/SubtitleEdit/subt....-.SAMPLES.zip

I can fix it manually using text editor, but I think it will be better if your program could do it automatically for us.

Thanks for your time.

Best regards.
johner23 is offline   Reply With Quote
Old 28th February 2019, 21:04   #775  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 221
@johner23: thx for the files - latest beta handles this better: https://github.com/SubtitleEdit/subt...leEditBeta.zip
SE adds/keeps the alignment though, so that why you'll see lines starting with e.g. "[\an8}". If you want to remove them, just select all lines in the list view (ctrl+a), right click, choose "Remove formatting -> Remove all formattings / Remove alignment".
Nikse555 is offline   Reply With Quote
Old 28th February 2019, 23:23   #776  |  Link
locotus
Registered User
 
Join Date: Nov 2005
Posts: 91
Is there any way of sort lines by beginning capital or smal letters or
by lines beginning with small letters fallowing lines ending
with comma, space or other punctuation mark except period?

Thanks.
locotus is offline   Reply With Quote
Old 2nd March 2019, 10:36   #777  |  Link
sneaker_ger
Registered User
 
Join Date: Dec 2002
Posts: 5,417
Batch converting vobsub in mkv via cli /convert (lots of "Tesseract returned with code 1" messages, timestamps are there but lines are empty) as well as GUI Tools->Batch (lots of "Tesseract returned with code 1" messages, almost empty files) convert isn't working for me. I would also be nice to have a stream selector for mkv/mp4 input on the cli.
sneaker_ger is offline   Reply With Quote
Old 2nd March 2019, 15:38   #778  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 221
@locotus: sorry, no - what exactly do you want to do?

@sneaker_ger: that should be fixed in last beta I hope - could you verify?
https://github.com/SubtitleEdit/subt...leEditBeta.zip (track-number parameter also added to cmd line)
Nikse555 is offline   Reply With Quote
Old 2nd March 2019, 16:15   #779  |  Link
locotus
Registered User
 
Join Date: Nov 2005
Posts: 91
Quote:
@locotus: sorry, no - what exactly do you want to do?
My main purpuse is to join dialogs lines that are
divided but with duration time and number of characters
below maximun range for both.

Actually I'm trying to do that sorting first for duration,
looking for very short lines with characters number greater
than 15. But can't do that with lines with longer time and
characters.

Thanks anyhow.
locotus is offline   Reply With Quote
Old 2nd March 2019, 16:28   #780  |  Link
sneaker_ger
Registered User
 
Join Date: Dec 2002
Posts: 5,417
Thx. I think there's a small typo: /? says "/trac-number:<track number>" (missing "k")

It still showed "Tesseract returned with code 1" over CLI but I figured out why: in the GUI OCR method was set to "Binary image compare". I changed it to "Tesseract 3.02" and the messages disappeared, the text appeared. I didn't expect the GUI setting to influence the CLI. Also it's not obvious why I would get Tesseract error messages if Tesseract isn't selected.

If you drag&drop mkv files into the GUI batch converter the GUI hangs for a long time, btw.

Last edited by sneaker_ger; 2nd March 2019 at 16:31.
sneaker_ger is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 08:14.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.