Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Subtitles

Reply
 
Thread Tools Search this Thread Display Modes
Old 10th May 2020, 06:56   #961  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 315
@tormento:
"ABBASSO I GLADIATORS" is not changed here... wrong language or something in your dictionaries?
Also, I'm not sure what you mean by "at least put Italic on the right side of the character to input during binary compare." - could you make a screenshot?

@jlw_4049/Janusz: I also cannot re-create the resize-and-restore-issue in latest beta, but I'll test on a few other computers.
jlw_4049, did you check version in Help -> About - also, how do you restore the minimized OCR window?


Latest beta: https://github.com/SubtitleEdit/subt...leEditBeta.zip
Contains some good fixes for rippers:
- Bluray sup files could miss some images (where a subtitle would be expanded with more text)
- Teletext from .ts/.m2ts/.mts sometimes missed last subtitle
Nikse555 is offline   Reply With Quote
Old 10th May 2020, 07:04   #962  |  Link
jlw_4049
Registered User
 
Join Date: Sep 2018
Posts: 277
Quote:
Originally Posted by Nikse555 View Post
@tormento:
"ABBASSO I GLADIATORS" is not changed here... wrong language or something in your dictionaries?
Also, I'm not sure what you mean by "at least put Italic on the right side of the character to input during binary compare." - could you make a screenshot?

@jlw_4049/Janusz: I also cannot re-create the resize-and-restore-issue in latest beta, but I'll test on a few other computers.
jlw_4049, did you check version in Help -> About - also, how do you restore the minimized OCR window?


Latest beta: https://github.com/SubtitleEdit/subt...leEditBeta.zip
Contains some good fixes for rippers:
- Bluray sup files could miss some images (where a subtitle would be expanded with more text)
- Teletext from .ts/.m2ts/.mts sometimes missed last subtitle
I'll try latest beta. Maybe I have an out dated version of the beta. Will double check in the AM.

It's been working perfectly other then that.

Will report back.

Sent from my Pixel 3a using Tapatalk
__________________
FFMPEG Audio Encoder
jlw_4049 is offline   Reply With Quote
Old 10th May 2020, 11:41   #963  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 1,526
Quote:
Originally Posted by Nikse555 View Post
@tormento: "ABBASSO I GLADIATORS" is not changed here... wrong language or something in your dictionaries?
Same issue with:
Code:
871
01:00:00,263 --> 01:00:02,849
<i>Ripeto. I sospetti del Nite Owl
sono scappati.</i>


Here is the srt.

Fresh install. The OCR files are the ones you distribute.

Quote:
Originally Posted by Nikse555 View Post
Also, I'm not sure what you mean by "at least put Italic on the right side of the character to input during binary compare." - could you make a screenshot?
Here it is:

__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 12th May 2020, 18:08   #964  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 1,526
It would be nice, when aborting OCR recognition, not to cancel the text of the current paragraph, but let it until the unrecognized character.

Sometimes it happens that some strange symbol can't be corrected by simply expanding and I have to abort to enter it manually. Unfortunately I have to enter the whole text!
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 12th May 2020, 22:15   #965  |  Link
Janusz
Registered User
 
Join Date: Apr 2020
Location: Poland
Posts: 58
If we are already talking about it there is some inconsistency in the window operation
<Import/OCR Blu-ray (.sup)...> without consideration to the Selected OCR method.

Maybe someone so wanted so yes it works, but:
  • when the OCR process is stopped at the selected <Binary image compare>
    or <OCR via nOCR> is as he wrote @tormento above.
    When you select <Tesseract>, the line is recognized to the end of the
    and only then the process is stopped.
  • the right side of the window and the 3rd list: <Unknow words>, <All fixes> and <Guesses used>.
    When the OCR process works, these lists are populated accordingly.
    When the process is stopped and resumed, the <Unknow words> list is cleaned completely,
    and the other two do not. Therefore, always before the resumption of the process, I must first
    check unknown words or correct errors in the <Unknow words> list before they disappear.

    I think a better solution here would be to add to the list just as in the other two.
    And ideally, in all 3 lists, the new text replaces the old from the line from which the process
    was resumed and was not remarked at the end.
  • also in the window <VobsubOCRNOcrCharacter> not only in <VobSub - Manual image to text>,
    wrote about it @GCRaistlin here.
    The <Skip entire image> button could be useful, e.g. for illegible images and more.

Finally: There is an error in Polish translation to the program in line 2528:

Code:
is: <Skip>P&amp;omoń</Skip>

to be: <Skip>P&amp;omiń</Skip>
__________________
Sorry for my mistakes - I'm using a translator.

Last edited by Janusz; 12th May 2020 at 22:27.
Janusz is offline   Reply With Quote
Old 13th May 2020, 11:28   #966  |  Link
Melan
Registered User
 
Melan's Avatar
 
Join Date: Jan 2014
Location: Poland
Posts: 60
Quote:
Originally Posted by Janusz View Post

Code:
is: <Skip>P&amp;omoń</Skip>

to be: <Skip>P&amp;omiń</Skip>
And other error (line 2532):

<AutoSubmitOnFirstChar>Autom. proponuj &amp;amp;pierwszy znak</AutoSubmitOnFirstChar>

<AutoSubmitOnFirstChar>Autom. proponuj pierwszy znak</AutoSubmitOnFirstChar>

borifax
Melan is offline   Reply With Quote
Old 13th May 2020, 14:51   #967  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 315
Quote:
Originally Posted by tormento View Post
Same issue with:
[CODE]
Good idea, fixed in latest beta: https://github.com/SubtitleEdit/subt...leEditBeta.zip

Also, "Skip" in the OCR char window will now only skip from current character (and not the whole line).


@Melan/Janusz: thx - updated Polish translation.
(the "&amp;" string will cause the following letter to be a shortcut - e.g. "&amp;Skip" will react to the "Alt+S" shortcut).
Nikse555 is offline   Reply With Quote
Old 14th May 2020, 14:36   #968  |  Link
Janusz
Registered User
 
Join Date: Apr 2020
Location: Poland
Posts: 58
Note: Applies to version 3.5.15 NEXT, beta 92.

Thank you for this change, Nikse555.

Error creating <Unknow words> list.



Lines # 40, # 73 and # 81 - we have the word FBl there, and it has to be FBI.
I have already added the word FBI to the dictionary "names.xml" once.
By <Add pair to OCR replace list> I add FBl to FBI. I start OCR and I have it:



In the <Subtitle text> window you can see that the conversion has been made and the word is known. This confirms the green color for this line.
Only that in <Unknown words> still hangs line # 40: FBl, although without # 73 and # 81.
Adding more word pairs works correctly - they do not appear again in the list. Well, unless there is no new word in the dictionary.
Line # 40 in this particular case will disappear only when I close the <Import / OCR Blu-ray ...> window and start the whole OCR process again.
But then another line with a different word will be the first forever with us until the window is closed.
I also checked it for words added to the dictionary - the first line displayed with the unknown word does not disappear.

Edition 1
  • The duplicate first lines will always appear on the second and subsequent file scans on all 3 lists also after changes made automatically
    by the rules from the OCRFixReplaceList_User, OCRFixReplaceList files or with the option enabled <Fix common OCR errors ...] in Option/Settings/Tools.
    They will not appear for automatic conversion of "l" (lowercase L) into "I" by Subtitle Edit, but we still don't see it on any of the lists,
    except for an unknown word, when such a replacement creates a new incorrect word.
__________________
Sorry for my mistakes - I'm using a translator.

Last edited by Janusz; 15th May 2020 at 11:44.
Janusz is offline   Reply With Quote
Old 14th May 2020, 15:12   #969  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 1,526
Quote:
Originally Posted by Nikse555 View Post
Also, "Skip" in the OCR char window will now only skip from current character (and not the whole line).
Thanks and please apply to abort too.
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 16th May 2020, 20:49   #970  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 315
Latest beta has new (and hopefully improved) detection of space between italic letters: https://github.com/SubtitleEdit/subt...leEditBeta.zip
Do let me know how it works! (it uses the value from "Set un-italic factor" in the list view context menu - probably normally between 0.22-0.32)
@tormento: thx for the test .sup files

Quote:
Originally Posted by tormento View Post
Thanks and please apply to abort too.
I actually ment that it works for the "Abort" button


@Janusz: I've fixed an issue related to your last post, but it's really hard to test without your exact setup/sup... could you make a .zip archive with all relevant files, if latest beta still has issues?

Last edited by Nikse555; 16th May 2020 at 20:50. Reason: typo+fixes
Nikse555 is offline   Reply With Quote
Old 16th May 2020, 23:22   #971  |  Link
GCRaistlin
Registered User
 
GCRaistlin's Avatar
 
Join Date: Jun 2006
Posts: 321
When performing OCR it is unable to add a proper match for the percent sign (https://mir.cr/10PHMJUD, # 68): SE recognizes its first part as "o". To add a better match, I deleted this "o" from the DB and run OCR again. This time the first part was recognized as "O", and 'Delete' button is inactive.
__________________
Magically yours
Raistlin
GCRaistlin is offline   Reply With Quote
Old 17th May 2020, 00:52   #972  |  Link
jlw_4049
Registered User
 
Join Date: Sep 2018
Posts: 277
The latest BETA struggles with ♪ characters very badly.
jlw_4049 is offline   Reply With Quote
Old 17th May 2020, 06:41   #973  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 315
Quote:
Originally Posted by GCRaistlin View Post
When performing OCR it is unable to add a proper match for the percent sign (https://mir.cr/10PHMJUD, # 68): SE recognizes its first part as "o". To add a better match, I deleted this "o" from the DB and run OCR again. This time the first part was recognized as "O", and 'Delete' button is inactive.
thx for the file
To fix "%" double click in the list view in main OCR window, then right-click in the list box in the "Inspect" windows and choose "Add better multi match", then expand the images to cover the "%" sign:




Quote:
Originally Posted by jlw_4049 View Post
The latest BETA struggles with ♪ characters very badly.
I probably need more info... subtitle + screenshots... you're using Tesseract for OCR'ing?
Nikse555 is offline   Reply With Quote
Old 17th May 2020, 08:25   #974  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 315
Shortcuts for the "OCR Character" window is:


Expand selection: Alt + arrow right
Shrink selection: Alt + arrow left
Toggle italic: Ctrl+I (+ Alt+I depending on translation)
Toggle auto-submit-first-char: Alt+F (depending on translation)
Skip current letter(s): Esc (+ Alt+S depending on translation)
Skip entire subtitle: Ctrl+Shift+S (new shortcut)
Nikse555 is offline   Reply With Quote
Old 17th May 2020, 11:37   #975  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 1,526
Quote:
Originally Posted by Nikse555 View Post
Shortcuts for the "OCR Character" window is
Now that you made me think about it, it would be nice to have the capability to expand both right and/or left side. Sometimes % character has bad OCR on the left and/or on the right too. The only think I can do now is abort and input it manually. Two buttons such as

|←|expand|→|
|→|shrink|←|


or the same changing function with SHIFT key would be nice.

P.S: The red italic word on the right of the character is great. You could remove the one on top of the window now.
__________________
@turment on Telegram

Last edited by tormento; 17th May 2020 at 11:44.
tormento is offline   Reply With Quote
Old 17th May 2020, 22:39   #976  |  Link
GCRaistlin
Registered User
 
GCRaistlin's Avatar
 
Join Date: Jun 2006
Posts: 321
What does auto-submit-first-char do?
__________________
Magically yours
Raistlin
GCRaistlin is offline   Reply With Quote
Old 17th May 2020, 23:02   #977  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 315
Quote:
Originally Posted by GCRaistlin View Post
What does auto-submit-first-char do?
It will use the first key down as the OCR letter without waiting for a click on "OK" or the "Enter" key pressed.

I often set the error rate to zero when starting OCR of a new sub for the first 10-20 lines, in which case I add a lot of single letters, and that's much faster without having to press the "Enter" key or the "OK" button.
(you need to turn it off again, if the prompt is for a multi letter image, like "ft")
Nikse555 is offline   Reply With Quote
Old 17th May 2020, 23:11   #978  |  Link
GCRaistlin
Registered User
 
GCRaistlin's Avatar
 
Join Date: Jun 2006
Posts: 321
Nikse555
Thanks. I'd say it is needed to add a brief explanation for this option to the UI, as long as for "add better multi match", as these options' names aren't self-explanatory.
__________________
Magically yours
Raistlin
GCRaistlin is offline   Reply With Quote
Old 17th May 2020, 23:25   #979  |  Link
GCRaistlin
Registered User
 
GCRaistlin's Avatar
 
Join Date: Jun 2006
Posts: 321
Quote:
Originally Posted by Nikse555 View Post
thx for the file
To fix "%" double click in the list view in main OCR window, then right-click in the list box in the "Inspect" windows and choose "Add better multi match", then expand the images to cover the "%" sign:
Something went wrong. I performed all the actions above, then rerun OCR from this line - SE didn't ask me anything but the percent sign is missing in the recognized line:


UPD: It seems that I didn't enter "%" to the field. It's worth to check if it isn't empty...
__________________
Magically yours
Raistlin

Last edited by GCRaistlin; 17th May 2020 at 23:30.
GCRaistlin is offline   Reply With Quote
Old 17th May 2020, 23:47   #980  |  Link
GCRaistlin
Registered User
 
GCRaistlin's Avatar
 
Join Date: Jun 2006
Posts: 321
Bug(s):
  1. Follow the steps above but add a wrong match, for example "@".
  2. Start OCR from the same line, then interrupt it.
  3. Call 'Inspect compare matches' window.
  4. Delete the wrong match, add the right match, press OK.
  5. Start OCR from the same line again.
    You'll get 'VobSub - Manual image to text' window for the char you have just added a match for. And by the way the window title is incorrect - it's not the VobSub being recognized. But let's go further.
  6. Press Abort, try to add multi match again. You'll get 'Image already in db' error.
__________________
Magically yours
Raistlin

Last edited by GCRaistlin; 18th May 2020 at 10:07.
GCRaistlin is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 04:44.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, vBulletin Solutions Inc.