Subtitle Edit 4.0.4 - Page 49

Nikse555 · 10th May 2020, 06:56

@tormento:
"ABBASSO I GLADIATORS" is not changed here... wrong language or something in your dictionaries?
Also, I'm not sure what you mean by "at least put Italic on the right side of the character to input during binary compare." - could you make a screenshot?

@jlw_4049/Janusz: I also cannot re-create the resize-and-restore-issue in latest beta, but I'll test on a few other computers.
jlw_4049, did you check version in Help -> About - also, how do you restore the minimized OCR window?

Latest beta: https://github.com/SubtitleEdit/subt...leEditBeta.zip
Contains some good fixes for rippers:
- Bluray sup files could miss some images (where a subtitle would be expanded with more text)
- Teletext from .ts/.m2ts/.mts sometimes missed last subtitle

jlw_4049 · 10th May 2020, 07:04

Quote:

Originally Posted by Nikse555

@tormento:
"ABBASSO I GLADIATORS" is not changed here... wrong language or something in your dictionaries?
Also, I'm not sure what you mean by "at least put Italic on the right side of the character to input during binary compare." - could you make a screenshot?

@jlw_4049/Janusz: I also cannot re-create the resize-and-restore-issue in latest beta, but I'll test on a few other computers.
jlw_4049, did you check version in Help -> About - also, how do you restore the minimized OCR window?

Latest beta: https://github.com/SubtitleEdit/subt...leEditBeta.zip
Contains some good fixes for rippers:
- Bluray sup files could miss some images (where a subtitle would be expanded with more text)
- Teletext from .ts/.m2ts/.mts sometimes missed last subtitle

I'll try latest beta. Maybe I have an out dated version of the beta. Will double check in the AM.

It's been working perfectly other then that.

Will report back.

Sent from my Pixel 3a using Tapatalk

tormento · 10th May 2020, 11:41

Quote:

Originally Posted by Nikse555

@tormento: "ABBASSO I GLADIATORS" is not changed here... wrong language or something in your dictionaries?

Same issue with:

Code:

871
01:00:00,263 --> 01:00:02,849
<i>Ripeto. I sospetti del Nite Owl
sono scappati.</i>

Here is the srt.

Fresh install. The OCR files are the ones you distribute.

Quote:

Originally Posted by Nikse555

Also, I'm not sure what you mean by "at least put Italic on the right side of the character to input during binary compare." - could you make a screenshot?

Here it is:

tormento · 12th May 2020, 18:08

It would be nice, when aborting OCR recognition, not to cancel the text of the current paragraph, but let it until the unrecognized character.

Sometimes it happens that some strange symbol can't be corrected by simply expanding and I have to abort to enter it manually. Unfortunately I have to enter the whole text!

Janusz · 12th May 2020, 22:15

If we are already talking about it there is some inconsistency in the window operation
<Import/OCR Blu-ray (.sup)...> without consideration to the Selected OCR method.

Maybe someone so wanted so yes it works, but:

when the OCR process is stopped at the selected <Binary image compare>
or <OCR via nOCR> is as he wrote @tormento above.
When you select <Tesseract>, the line is recognized to the end of the
and only then the process is stopped.

the right side of the window and the 3rd list: <Unknow words>, <All fixes> and <Guesses used>.
When the OCR process works, these lists are populated accordingly.
When the process is stopped and resumed, the <Unknow words> list is cleaned completely,
and the other two do not. Therefore, always before the resumption of the process, I must first
check unknown words or correct errors in the <Unknow words> list before they disappear.

I think a better solution here would be to add to the list just as in the other two.
And ideally, in all 3 lists, the new text replaces the old from the line from which the process
was resumed and was not remarked at the end.

also in the window <VobsubOCRNOcrCharacter> not only in <VobSub - Manual image to text>,
wrote about it @GCRaistlin here.
The <Skip entire image> button could be useful, e.g. for illegible images and more.

Finally: There is an error in Polish translation to the program in line 2528:

Code:

is: <Skip>P&amp;omoń</Skip>

to be: <Skip>P&amp;omiń</Skip>

Melan · 13th May 2020, 11:28

Quote:

Originally Posted by Janusz

Code:

is: <Skip>P&amp;omoń</Skip>

to be: <Skip>P&amp;omiń</Skip>

And other error (line 2532):

<AutoSubmitOnFirstChar>Autom. proponuj &amp;pierwszy znak</AutoSubmitOnFirstChar>

<AutoSubmitOnFirstChar>Autom. proponuj pierwszy znak</AutoSubmitOnFirstChar>

borifax

Nikse555 · 13th May 2020, 14:51

Quote:

Originally Posted by tormento

Same issue with:
[CODE]

Good idea, fixed in latest beta: https://github.com/SubtitleEdit/subt...leEditBeta.zip

Also, "Skip" in the OCR char window will now only skip from current character (and not the whole line).

@Melan/Janusz: thx - updated Polish translation.
(the "&" string will cause the following letter to be a shortcut - e.g. "&Skip" will react to the "Alt+S" shortcut).

Janusz · 14th May 2020, 14:36

Note: Applies to version 3.5.15 NEXT, beta 92.

Thank you for this change, Nikse555.

Error creating <Unknow words> list.

Lines # 40, # 73 and # 81 - we have the word FBl there, and it has to be FBI.
I have already added the word FBI to the dictionary "names.xml" once.
By <Add pair to OCR replace list> I add FBl to FBI. I start OCR and I have it:

In the <Subtitle text> window you can see that the conversion has been made and the word is known. This confirms the green color for this line.
Only that in <Unknown words> still hangs line # 40: FBl, although without # 73 and # 81.
Adding more word pairs works correctly - they do not appear again in the list. Well, unless there is no new word in the dictionary.
Line # 40 in this particular case will disappear only when I close the <Import / OCR Blu-ray ...> window and start the whole OCR process again.
But then another line with a different word will be the first forever with us until the window is closed.
I also checked it for words added to the dictionary - the first line displayed with the unknown word does not disappear.

Edition 1

The duplicate first lines will always appear on the second and subsequent file scans on all 3 lists also after changes made automatically
by the rules from the OCRFixReplaceList_User, OCRFixReplaceList files or with the option enabled <Fix common OCR errors ...] in Option/Settings/Tools.
They will not appear for automatic conversion of "l" (lowercase L) into "I" by Subtitle Edit, but we still don't see it on any of the lists,
except for an unknown word, when such a replacement creates a new incorrect word.

tormento · 14th May 2020, 15:12

Quote:

Originally Posted by Nikse555

Also, "Skip" in the OCR char window will now only skip from current character (and not the whole line).

Thanks and please apply to abort too.

Nikse555 · 16th May 2020, 20:49

Latest beta has new (and hopefully improved) detection of space between italic letters: https://github.com/SubtitleEdit/subt...leEditBeta.zip
Do let me know how it works! (it uses the value from "Set un-italic factor" in the list view context menu - probably normally between 0.22-0.32)
@tormento: thx for the test .sup files

Quote:

Originally Posted by tormento

Thanks and please apply to abort too.

I actually ment that it works for the "Abort" button

@Janusz: I've fixed an issue related to your last post, but it's really hard to test without your exact setup/sup... could you make a .zip archive with all relevant files, if latest beta still has issues?

GCRaistlin · 16th May 2020, 23:22

When performing OCR it is unable to add a proper match for the percent sign (https://mir.cr/10PHMJUD, # 68): SE recognizes its first part as "o". To add a better match, I deleted this "o" from the DB and run OCR again. This time the first part was recognized as "O", and 'Delete' button is inactive.

jlw_4049 · 17th May 2020, 00:52

The latest BETA struggles with ♪ characters very badly.

Nikse555 · 17th May 2020, 06:41

Quote:

Originally Posted by GCRaistlin

When performing OCR it is unable to add a proper match for the percent sign (https://mir.cr/10PHMJUD, # 68): SE recognizes its first part as "o". To add a better match, I deleted this "o" from the DB and run OCR again. This time the first part was recognized as "O", and 'Delete' button is inactive.

thx for the file

To fix "%" double click in the list view in main OCR window, then right-click in the list box in the "Inspect" windows and choose "Add better multi match", then expand the images to cover the "%" sign:

Quote:

Originally Posted by jlw_4049

The latest BETA struggles with ♪ characters very badly.

I probably need more info... subtitle + screenshots... you're using Tesseract for OCR'ing?

Nikse555 · 17th May 2020, 08:25

Shortcuts for the "OCR Character" window is:

Expand selection: Alt + arrow right
Shrink selection: Alt + arrow left
Toggle italic: Ctrl+I (+ Alt+I depending on translation)
Toggle auto-submit-first-char: Alt+F (depending on translation)
Skip current letter(s): Esc (+ Alt+S depending on translation)
Skip entire subtitle: Ctrl+Shift+S (new shortcut)

tormento · 17th May 2020, 11:37

Quote:

Originally Posted by Nikse555

Shortcuts for the "OCR Character" window is

Now that you made me think about it, it would be nice to have the capability to expand both right and/or left side. Sometimes % character has bad OCR on the left and/or on the right too. The only think I can do now is abort and input it manually. Two buttons such as

|←|expand|→|
|→|shrink|←|

or the same changing function with SHIFT key would be nice.

P.S: The red italic word on the right of the character is great. You could remove the one on top of the window now.

GCRaistlin · 17th May 2020, 22:39

What does auto-submit-first-char do?

Nikse555 · 17th May 2020, 23:02

Quote:

Originally Posted by GCRaistlin

What does auto-submit-first-char do?

It will use the first key down as the OCR letter without waiting for a click on "OK" or the "Enter" key pressed.

I often set the error rate to zero when starting OCR of a new sub for the first 10-20 lines, in which case I add a lot of single letters, and that's much faster without having to press the "Enter" key or the "OK" button.
(you need to turn it off again, if the prompt is for a multi letter image, like "ft")

GCRaistlin · 17th May 2020, 23:11

Nikse555
Thanks. I'd say it is needed to add a brief explanation for this option to the UI, as long as for "add better multi match", as these options' names aren't self-explanatory.

GCRaistlin · 17th May 2020, 23:25

Quote:

Originally Posted by Nikse555

thx for the file

To fix "%" double click in the list view in main OCR window, then right-click in the list box in the "Inspect" windows and choose "Add better multi match", then expand the images to cover the "%" sign:

Something went wrong. I performed all the actions above, then rerun OCR from this line - SE didn't ask me anything but the percent sign is missing in the recognized line:

UPD: It seems that I didn't enter "%" to the field. It's worth to check if it isn't empty...

GCRaistlin · 17th May 2020, 23:47

Bug(s):

Follow the steps above but add a wrong match, for example "@".
Start OCR from the same line, then interrupt it.
Call 'Inspect compare matches' window.
Delete the wrong match, add the right match, press OK.
Start OCR from the same line again.
You'll get 'VobSub - Manual image to text' window for the char you have just added a match for. And by the way the window title is incorrect - it's not the VobSub being recognized. But let's go further.
Press Abort, try to add multi match again. You'll get 'Image already in db' error.

12th May 2020, 18:08	#964 \| Link
tormento Acid fr0g Join Date: May 2002 Location: Italy Posts: 2,582	It would be nice, when aborting OCR recognition, not to cancel the text of the current paragraph, but let it until the unrecognized character. Sometimes it happens that some strange symbol can't be corrected by simply expanding and I have to abort to enter it manually. Unfortunately I have to enter the whole text! __________________ @turment on Telegram

12th May 2020, 22:15	#965 \| Link
Janusz Registered User Join Date: Apr 2020 Location: Poland Posts: 143	If we are already talking about it there is some inconsistency in the window operation <Import/OCR Blu-ray (.sup)...> without consideration to the Selected OCR method. Maybe someone so wanted so yes it works, but: when the OCR process is stopped at the selected <Binary image compare> or <OCR via nOCR> is as he wrote @tormento above. When you select <Tesseract>, the line is recognized to the end of the and only then the process is stopped. the right side of the window and the 3rd list: <Unknow words>, <All fixes> and <Guesses used>. When the OCR process works, these lists are populated accordingly. When the process is stopped and resumed, the <Unknow words> list is cleaned completely, and the other two do not. Therefore, always before the resumption of the process, I must first check unknown words or correct errors in the <Unknow words> list before they disappear. I think a better solution here would be to add to the list just as in the other two. And ideally, in all 3 lists, the new text replaces the old from the line from which the process was resumed and was not remarked at the end. also in the window <VobsubOCRNOcrCharacter> not only in <VobSub - Manual image to text>, wrote about it @GCRaistlin here. The <Skip entire image> button could be useful, e.g. for illegible images and more. Finally: There is an error in Polish translation to the program in line 2528: Code: is: <Skip>P&omoń</Skip> to be: <Skip>P&omiń</Skip> __________________ Sorry for my mistakes - I'm using a translator. Last edited by Janusz; 12th May 2020 at 22:27.

14th May 2020, 14:36	#968 \| Link
Janusz Registered User Join Date: Apr 2020 Location: Poland Posts: 143	Note: Applies to version 3.5.15 NEXT, beta 92. Thank you for this change, Nikse555. Error creating <Unknow words> list. Lines # 40, # 73 and # 81 - we have the word FBl there, and it has to be FBI. I have already added the word FBI to the dictionary "names.xml" once. By <Add pair to OCR replace list> I add FBl to FBI. I start OCR and I have it: In the <Subtitle text> window you can see that the conversion has been made and the word is known. This confirms the green color for this line. Only that in <Unknown words> still hangs line # 40: FBl, although without # 73 and # 81. Adding more word pairs works correctly - they do not appear again in the list. Well, unless there is no new word in the dictionary. Line # 40 in this particular case will disappear only when I close the <Import / OCR Blu-ray ...> window and start the whole OCR process again. But then another line with a different word will be the first forever with us until the window is closed. I also checked it for words added to the dictionary - the first line displayed with the unknown word does not disappear. Edition 1 The duplicate first lines will always appear on the second and subsequent file scans on all 3 lists also after changes made automatically by the rules from the OCRFixReplaceList_User, OCRFixReplaceList files or with the option enabled <Fix common OCR errors ...] in Option/Settings/Tools. They will not appear for automatic conversion of "l" (lowercase L) into "I" by Subtitle Edit, but we still don't see it on any of the lists, except for an unknown word, when such a replacement creates a new incorrect word. __________________ Sorry for my mistakes - I'm using a translator. Last edited by Janusz; 15th May 2020 at 11:44.

16th May 2020, 23:22	#971 \| Link
GCRaistlin Registered User Join Date: Jun 2006 Posts: 353	When performing OCR it is unable to add a proper match for the percent sign (https://mir.cr/10PHMJUD, # 68): SE recognizes its first part as "o". To add a better match, I deleted this "o" from the DB and run OCR again. This time the first part was recognized as "O", and 'Delete' button is inactive. __________________ Windows 8.1 x64 Magically yours Raistlin

17th May 2020, 22:39	#976 \| Link
GCRaistlin Registered User Join Date: Jun 2006 Posts: 353	What does auto-submit-first-char do? __________________ Windows 8.1 x64 Magically yours Raistlin

10th May 2020, 06:56	#961 \| Link
Nikse555 Registered User Join Date: Feb 2004 Location: Mars Posts: 428	@tormento: "ABBASSO I GLADIATORS" is not changed here... wrong language or something in your dictionaries? Also, I'm not sure what you mean by "at least put Italic on the right side of the character to input during binary compare." - could you make a screenshot? @jlw_4049/Janusz: I also cannot re-create the resize-and-restore-issue in latest beta, but I'll test on a few other computers. jlw_4049, did you check version in Help -> About - also, how do you restore the minimized OCR window? Latest beta: https://github.com/SubtitleEdit/subt...leEditBeta.zip Contains some good fixes for rippers: - Bluray sup files could miss some images (where a subtitle would be expanded with more text) - Teletext from .ts/.m2ts/.mts sometimes missed last subtitle

17th May 2020, 00:52	#972 \| Link
jlw_4049 Registered User Join Date: Sep 2018 Posts: 391	The latest BETA struggles with ♪ characters very badly.

17th May 2020, 08:25	#974 \| Link
Nikse555 Registered User Join Date: Feb 2004 Location: Mars Posts: 428	Shortcuts for the "OCR Character" window is: Expand selection: Alt + arrow right Shrink selection: Alt + arrow left Toggle italic: Ctrl+I (+ Alt+I depending on translation) Toggle auto-submit-first-char: Alt+F (depending on translation) Skip current letter(s): Esc (+ Alt+S depending on translation) Skip entire subtitle: Ctrl+Shift+S (new shortcut)

17th May 2020, 23:11	#978 \| Link
GCRaistlin Registered User Join Date: Jun 2006 Posts: 353	Nikse555 Thanks. I'd say it is needed to add a brief explanation for this option to the UI, as long as for "add better multi match", as these options' names aren't self-explanatory. __________________ Windows 8.1 x64 Magically yours Raistlin

17th May 2020, 23:47	#980 \| Link
GCRaistlin Registered User Join Date: Jun 2006 Posts: 353	Bug(s): Follow the steps above but add a wrong match, for example "@". Start OCR from the same line, then interrupt it. Call 'Inspect compare matches' window. Delete the wrong match, add the right match, press OK. Start OCR from the same line again. You'll get 'VobSub - Manual image to text' window for the char you have just added a match for. And by the way the window title is incorrect - it's not the VobSub being recognized. But let's go further. Press Abort, try to add multi match again. You'll get 'Image already in db' error. __________________ Windows 8.1 x64 Magically yours Raistlin Last edited by GCRaistlin; 18th May 2020 at 10:07.