Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Subtitles

Reply
 
Thread Tools Search this Thread Display Modes
Old 20th May 2020, 14:35   #1001  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 1,423
Quote:
Originally Posted by Nikse555 View Post
@tormento: Ah, did you set the proper "italic factor"? Right click in the list view, and choose "Set un-italic" factor (I think it's called).
What value did you put in the un-italic? Tried many and no luck.
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 20th May 2020, 14:35   #1002  |  Link
Janusz
Registered User
 
Join Date: Apr 2020
Location: Poland
Posts: 22
Quote:
Originally Posted by Nikse555 View Post
@Janusz: Is the crash fixed in this beta?
https://github.com/SubtitleEdit/subt...leEditBeta.zip
Yes too.
But now I don't have the "error_log.txt" file, the warning window is the same as before.
__________________
Sorry for my mistakes - I'm using a translator.

Last edited by Janusz; 20th May 2020 at 14:47.
Janusz is online now   Reply With Quote
Old 20th May 2020, 14:36   #1003  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 1,423
Quote:
Originally Posted by Melan View Post
Don't be a child.
Oh.

My.

God.

41 posts and you sermonize.

__________________
@turment on Telegram

Last edited by tormento; 20th May 2020 at 14:46.
tormento is offline   Reply With Quote
Old 20th May 2020, 17:46   #1004  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 282
@Janusz: OK, think I got the crash now: https://github.com/SubtitleEdit/subt...leEditBeta.zip

@tormento: Right click on image for line 3 in Apollo13 and choose "Set align angle" (previously "Set un-italic factor"). Looks like "0,21" is a good value. Does that work for you? (#pixels is space=15)

Last edited by Nikse555; 20th May 2020 at 17:49.
Nikse555 is online now   Reply With Quote
Old 20th May 2020, 18:02   #1005  |  Link
Janusz
Registered User
 
Join Date: Apr 2020
Location: Poland
Posts: 22
Quote:
Originally Posted by Nikse555 View Post
@Janusz: OK, think I got the crash now: https://github.com/SubtitleEdit/subt...leEditBeta.zip
Yes, it works, thank you.
I have a few more comments for this wonderful program that do not depend on the OCR method chosen, but first I need to prepare the appropriate files.
__________________
Sorry for my mistakes - I'm using a translator.
Janusz is online now   Reply With Quote
Old 20th May 2020, 18:46   #1006  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 1,423
Quote:
Originally Posted by Nikse555 View Post
Right click on image for line 3 in Apollo13 and choose "Set align angle" (previously "Set un-italic factor"). Looks like "0,21" is a good value. Does that work for you? (#pixels is space=15)
Unfortunately, even with less space pixels between letters, SE recognizes "of" outside italic and attaches it to "Apollo".

__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 20th May 2020, 20:18   #1007  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 282
@tormento: thx This should now work (did not work because it was half italic / half regular): https://github.com/SubtitleEdit/subt...leEditBeta.zip
But it was also working before for me... and for you too, if you had been using the dictionaries included with SE, like "eng_OCRFixReplaceList.xml". Why would you not use them?
Nikse555 is online now   Reply With Quote
Old 21st May 2020, 14:17   #1008  |  Link
Janusz
Registered User
 
Join Date: Apr 2020
Location: Poland
Posts: 22
Quote:
Originally Posted by Nikse555 View Post
@tormento: thx This should now work (did not work because it was half italic / half regular)
Beta 129.
For this example I created a sup file from the text "the tragedy ofAlabama" where "the tragedy of" I marked italic.

I only installed the following dictionaries: French, German, Italian, English without additional OCRFixReplaceList.xml files.
For: French, German, Italian, English - the patch works ok. The text after OCR looks like this: "<i>the tragedy of</i> Alabama",
for: Polish and "none" like this: "<i>the tragedy</i> ofAlabama".
I did not check others, but I think the amendment should work in all languages because the word "ofAlabama" is not correct in any language, and any division in this case may occur between italics/regular or regular/italics always regardless of the language chosen how many new words exist in the selected dictionary. Example from Poland: "fotografAdam" (photographer Adam).
__________________
Sorry for my mistakes - I'm using a translator.

Last edited by Janusz; 21st May 2020 at 15:01.
Janusz is online now   Reply With Quote
Old 21st May 2020, 16:50   #1009  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 282
Quote:
Originally Posted by Janusz View Post
Beta 129.
For this example I created a sup file from the text "the tragedy ofAlabama" where "the tragedy of" I marked italic.

I only installed the following dictionaries: French, German, Italian, English without additional OCRFixReplaceList.xml files.
For: French, German, Italian, English - the patch works ok. The text after OCR looks like this: "<i>the tragedy of</i> Alabama",
for: Polish and "none" like this: "<i>the tragedy</i> ofAlabama".
I did not check others, but I think the amendment should work in all languages because the word "ofAlabama" is not correct in any language, and any division in this case may occur between italics/regular or regular/italics always regardless of the language chosen how many new words exist in the selected dictionary. Example from Poland: "fotografAdam" (photographer Adam).
Yes, the OCR process benefits from a good OCR fix replace list.
I've added a Polish one based on your input here: https://github.com/SubtitleEdit/subt...eplaceList.xml
Feel free to add to it
Nikse555 is online now   Reply With Quote
Old 21st May 2020, 21:57   #1010  |  Link
Janusz
Registered User
 
Join Date: Apr 2020
Location: Poland
Posts: 22
Quote:
Originally Posted by Nikse555 View Post
Yes, the OCR process benefits from a good OCR fix replace list.
Quote:
<WordPart from = "f" to = "f " /> <! - "f" will be two words ->
I had this line in <OCRFixReplaceList> so there had to be something else here.
In the first version of the file, the line contained only one phrase "photographerAdam".
"Adam" is only 5 letters, I thought maybe this is it?

I have created a new file. I added a few lines and longer words starting with "A".



OCR worked, but as you can see above - not quite.
The division has happened, but </i> it is not everywhere it should be. Only on line 2 is good.
I disabled split after "f" in "pol_OCRFixReplaceList.xml". The effect of this is at the bottom.
The division is correct, </i> is where it should be, also on line 1.

Conclusion: The rare case of such a combination of words means that we have to choose ourselves:
enable or disable this option and when in our "OCRFixReplaceList.xml",
because we can do more damage than it is worth.

If you really don't have anything to do, you could look into the source, because changing the dictionary repeatedly to any one installed
and each time OCR with a new dictionary causes that what now looks so nice at the bottom will look like at the top again.
Only starting OCR restores order again.
I know that nobody will mix dictionaries under normal use, but the problem is.
__________________
Sorry for my mistakes - I'm using a translator.
Janusz is online now   Reply With Quote
Old 23rd May 2020, 10:16   #1011  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 1,423
Quote:
Originally Posted by Nikse555 View Post
But it was also working before for me... and for you too, if you had been using the dictionaries included with SE, like "eng_OCRFixReplaceList.xml". Why would you not use them?
It does work with OCR fix, not without.

I tend not to use it because the I have trained the OCR so well that I can postprocess I-l after OCR and have a faster job.

Perhaps you could implement a OCR fix with OCR errors only and not word dictionary aware.
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 23rd May 2020, 12:56   #1012  |  Link
Janusz
Registered User
 
Join Date: Apr 2020
Location: Poland
Posts: 22
@Nikse555
To report a mistake.
Occurs since beta 119, beta 112 works fine.

The previously reported bug in beta 123 and later concerned a missing dictionary.
Because I rarely use "Prompt for unknown words" so the option was not enabled and was not checked.

In my previous thread I used "Binary image compare" so I didn't notice this error.
Today I returned to nOCR. My Settings: for the function to work, the dictionary must be selected so it is selected.
"Draw missing texts" - disabled so that the program does not call for every new unknown letter.
( Even for this function of the program it is worth using nOCR ).
"Prompt for unknown words" - enabled.
"Fix OCR errors" - disabled - OCR does not use user files.
"Try to guess unknown words" - does not matter with "Fix ..." = disabled. It doesn't work though it's turned on.

Start OCR begins to process the text until it encounters the first unknown word.
With a well-constructed character base, it will be a word not in the dictionary, otherwise an unrecognized character in the word.
The process calls the "Spell check" window. "Skip one", "Skip all", "Abort" causes an error window to be called:



Depending on what we choose, we will return to Windows - the program will crash or to the Program.
I checked on various sup files, including those available from this forum.

@Tormento
Quote:
I don't usually use it because I trained OCR so well that I can postprocess Il after OCR and have a faster job.
Perhaps you could only implement the OCR patch with OCR errors and not recognize the word dictionary.
I do not use "Fix common OCR errors - also use hard-coded rules" because this option does something more than just what results from its name - especially when it comes to Il and iL. For this reason I do not use "I" in the character database. I have definitely fewer mistakes to improve, at least I know which ones.
My assumption is that changes made for English should not affect other languages ​​available in the program.
__________________
Sorry for my mistakes - I'm using a translator.

Last edited by Janusz; 23rd May 2020 at 13:34.
Janusz is online now   Reply With Quote
Old 23rd May 2020, 20:33   #1013  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 282
@Janusz: thx for the crash info
Should hopefully be fixed here: https://github.com/SubtitleEdit/subt...leEditBeta.zip

Also, Ctrl+T in the OCR window will start some auto-training... probably not too useful, but it's a little fun to play with.
Nikse555 is online now   Reply With Quote
Old 23rd May 2020, 21:55   #1014  |  Link
Janusz
Registered User
 
Join Date: Apr 2020
Location: Poland
Posts: 22
Patch works, thank you.

Quote:
Originally Posted by Nikse555 View Post
Also, Ctrl+T in the OCR window will start some auto-training... probably not too useful, but it's a little fun to play with.
This function was and is also available under the right mouse button.
However, using it did not bring up any additional windows as it does now.
Something was happening in the background, the effects of this work could not be seen.
I noticed this window yesterday, but I didn't have time to check exactly what it was.
I am curious myself how this file will look.
__________________
Sorry for my mistakes - I'm using a translator.

Last edited by Janusz; Yesterday at 07:28.
Janusz is online now   Reply With Quote
Old Yesterday, 08:58   #1015  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 282
Forgot... for training you need a .srt file with spaces around characters, like:
Code:
1
00:00:00,490 --> 00:00:02,350
a b c d e f g h i j k l m n o p q r s t u

2
00:00:02,530 --> 00:00:04,150
v w x y z

3
00:00:04,240 --> 00:00:06,240
0 1 2 3 4 5 6 7 8 9 , . ( ) [ ] ' " $ % ♫ ♪  &

4
00:00:06,510 --> 00:00:08,200
A B C D E F G H I J K L M N O P Q R S T U

5
00:00:08,320 --> 00:00:10,570
V W X Y Z

6
00:00:11,510 --> 00:00:13,510
: ; - ! ?

7
00:00:13,540 --> 00:00:15,540
  Č Ę Ė Į  Ū  č ę ė į  ų 

8
00:00:15,560 --> 00:00:17,560
            

9
00:00:17,584 --> 00:00:19,584
ff ft fi fj fl rf rt rv rw ry rt ryt tt TV tw yt yw

Also, unattended OCR alarm (taskbar blink/beep) is now customizable (via Settings.xml) and these settings ( in latest beta: https://github.com/SubtitleEdit/subt...leEditBeta.zip ):
<UnfocusedAttentionBlinkCount>50</UnfocusedAttentionBlinkCount>
<UnfocusedAttentionPlaySoundCount>2</UnfocusedAttentionPlaySoundCount>
<UnfocusedAttentionPlaySoundEvery>2</UnfocusedAttentionPlaySoundEvery>
Nikse555 is online now   Reply With Quote
Old Today, 11:57   #1016  |  Link
Janusz
Registered User
 
Join Date: Apr 2020
Location: Poland
Posts: 22
And the game is over.
The text with 6052 single lines (short to 43 characters) and broken lines (more than 43 characters) was read without the need to add at least 1 character. I'm really shocked.
One thing I have improved is that I added a few triple and a dozen double characters to your train.srt file
__________________
Sorry for my mistakes - I'm using a translator.
Janusz is online now   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 12:05.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, vBulletin Solutions Inc.