Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
7th January 2019, 22:12 | #761 | Link |
Registered User
Join Date: Nov 2018
Posts: 34
|
Currently I'm using Tesseract 4.00 with engine mode Tesseract + LSTM.
Unfortunately I have to do massive post work. 'I' is sometimes a 'L' ,'l' or '!'. Music symbol isn't recognized right (it's a P or D). And I don't know what I have overlooked. Is there anything I could improve in my settings? EDIT: I'm using binary image compare now and creating new database for every BluRay. Works. Last edited by FLX90; 7th January 2019 at 23:53. |
14th January 2019, 02:04 | #762 | Link | |||
Registered User
Join Date: Jul 2003
Location: Brazil
Posts: 234
|
Hello, dear all.
I got the newest version from Subtitle Edit and try to convert a subtitle format to other. The original subtitle is *.vtt fomat. The output is to be *.srt format. But after the convertion the subtitle in srt show some strange errors. Eg: Quote:
Quote:
Quote:
Do you know how to fix that? I can make manual correction after the convertion to srt, but I would prefer that the program could do it at the first attempt, without any "errors" or strange "symbols" after the task. Thanks for your time. Best regards. Last edited by johner23; 14th January 2019 at 02:07. |
|||
27th January 2019, 19:12 | #763 | Link |
Registered User
Join Date: Jan 2019
Posts: 1
|
Only first text is converted on batch mode
Hi
If I convert from sub to srt on batch-mode (/convert) the resulting srt file contains all timeframes but only the first timeframe contains a text. If I do this with the GUI, the whole text is converted and all timeframes contains text. What do I wrong? Thanks Hendi |
31st January 2019, 07:57 | #764 | Link |
Registered User
Join Date: Feb 2004
Location: Mars
Posts: 428
|
@Hendi: That's a bug, sorry - should be fixed in latest beta: https://github.com/SubtitleEdit/subt...leEditBeta.zip
Next final version should be out soon... |
3rd February 2019, 01:55 | #765 | Link | |
Registered User
Join Date: May 2005
Posts: 1,462
|
Quote:
Btw, why OCR-ing?! That's so 1985!
__________________
Gorgeous, delicious, deculture! |
|
4th February 2019, 13:34 | #766 | Link | |
Registered User
Join Date: Feb 2004
Location: Mars
Posts: 428
|
Quote:
https://github.com/SubtitleEdit/subt...leEditBeta.zip SE 3.5.9 should be out soon... |
|
10th February 2019, 08:20 | #767 | Link |
Registered User
Join Date: Mar 2016
Posts: 6
|
I'm using SE to convert from bitmap Chinese/Japanese/Korean subs (sup, idx/sub, DVDsub).
There is a possibility to save one sub picture at a time (with right click) while importing , but could you add a function of saving all the sub pictures during OCR? For example, just a new option in the "export" (during OCR): export all the pictures as *.png? (just the images, no xml because it would be more complicated with "dirty" idx/subs (i.e with errors).) It's because after finishing the OCR and saving the OCRed srt, I have to correct the (many) mistakes "offline" , but there is no way to compare the OCRed text to the original pictures. Thanks for your great work! Last edited by dngnt; 10th February 2019 at 09:11. |
10th February 2019, 15:37 | #769 | Link |
Registered User
Join Date: Feb 2004
Location: Mars
Posts: 428
|
And SE 3.5.9 is out: https://github.com/SubtitleEdit/subtitleedit/releases
Some of the changes (mostly related to OCR) listed below: * NEW: * Bookmarks - thx OmrSi/marb99 * Image export - option to have single lines top justified - thx joedmartin * IMPROVED: * Improve Binary OCR of comma / apostrophe - thx Tuukka * Improve quote/italic detection in binary OCR - thx Miggu * Add context menu to OCR spell check * Binary OCR auto detect best DB - thx Mr. Rage * FIXED: * Fix missing/bad html tags after "Auto br" - thx iromafia111 * Fix crash in OCR window when closing - thx spetragl * Fix OCR in batch convert - thx danstraughn * Fix crash parsing empty word in OCR via Tesseract - thx Barry |
10th February 2019, 19:41 | #771 | Link |
Registered User
Join Date: Feb 2004
Location: Mars
Posts: 428
|
@varekai: Yes, Tesseract302 is included in the installer version as well.
Tesseract 4 is available as an in-program download, but note that T4 does not support italic detection and is a lot slower, but T4 is available in more languages and may work better for small unclear fonts. |
11th February 2019, 22:16 | #773 | Link | |
Registered User
Join Date: Mar 2016
Posts: 6
|
Quote:
Subtitle Edit is the best for its import/export versatility for srt, and with these 2 export abilities for sup,idx/sub and DVDsub, it's truly the Swiss knife for subtitling! |
|
25th February 2019, 23:34 | #774 | Link |
Registered User
Join Date: Jul 2003
Location: Brazil
Posts: 234
|
Hi, dear all.
@Nikse555 Hi, Nikse. When I tested some files *.vtt I still get some errors. Different ones, I mean. Please, can you check it? ---> https://github.com/SubtitleEdit/subt...it/issues/3290 ---> https://github.com/SubtitleEdit/subt....-.SAMPLES.zip I can fix it manually using text editor, but I think it will be better if your program could do it automatically for us. Thanks for your time. Best regards. |
28th February 2019, 22:04 | #775 | Link |
Registered User
Join Date: Feb 2004
Location: Mars
Posts: 428
|
@johner23: thx for the files - latest beta handles this better: https://github.com/SubtitleEdit/subt...leEditBeta.zip
SE adds/keeps the alignment though, so that why you'll see lines starting with e.g. "[\an8}". If you want to remove them, just select all lines in the list view (ctrl+a), right click, choose "Remove formatting -> Remove all formattings / Remove alignment". |
2nd March 2019, 11:36 | #777 | Link |
Registered User
Join Date: Dec 2002
Posts: 5,565
|
Batch converting vobsub in mkv via cli /convert (lots of "Tesseract returned with code 1" messages, timestamps are there but lines are empty) as well as GUI Tools->Batch (lots of "Tesseract returned with code 1" messages, almost empty files) convert isn't working for me. I would also be nice to have a stream selector for mkv/mp4 input on the cli.
|
2nd March 2019, 16:38 | #778 | Link |
Registered User
Join Date: Feb 2004
Location: Mars
Posts: 428
|
@locotus: sorry, no - what exactly do you want to do?
@sneaker_ger: that should be fixed in last beta I hope - could you verify? https://github.com/SubtitleEdit/subt...leEditBeta.zip (track-number parameter also added to cmd line) |
2nd March 2019, 17:15 | #779 | Link | |
Registered User
Join Date: Nov 2005
Posts: 112
|
Quote:
divided but with duration time and number of characters below maximun range for both. Actually I'm trying to do that sorting first for duration, looking for very short lines with characters number greater than 15. But can't do that with lines with longer time and characters. Thanks anyhow. |
|
2nd March 2019, 17:28 | #780 | Link |
Registered User
Join Date: Dec 2002
Posts: 5,565
|
Thx. I think there's a small typo: /? says "/trac-number:<track number>" (missing "k")
It still showed "Tesseract returned with code 1" over CLI but I figured out why: in the GUI OCR method was set to "Binary image compare". I changed it to "Tesseract 3.02" and the messages disappeared, the text appeared. I didn't expect the GUI setting to influence the CLI. Also it's not obvious why I would get Tesseract error messages if Tesseract isn't selected. If you drag&drop mkv files into the GUI batch converter the GUI hangs for a long time, btw. Last edited by sneaker_ger; 2nd March 2019 at 17:31. |
|
|