Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Subtitles

Reply
 
Thread Tools Search this Thread Display Modes
Old 19th March 2020, 22:34   #841  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 282
@GCRaistlin: Editing the text in "Remove text for HI could be possible - how does this work: https://github.com/SubtitleEdit/subt...leEditBeta.zip
Nikse555 is offline   Reply With Quote
Old 19th March 2020, 22:52   #842  |  Link
Lucius Snow
Registered User
 
Join Date: Oct 2003
Location: Paris, France
Posts: 92
Quote:
Originally Posted by Nikse555 View Post
@Lucius Snow: Hm, that works fine here... how can I re-create your issue with video not opening? Can you give more information? Is it all .srt files or only some. Video type? Do you have the video window open?
Anyway, mpv seems properly installed because I use it to play videos. I manually installed it. The video does play.

My problem is just when I open an existing SRT from File / Reopen menu. Before the update, it opened both the SRT and the associated video file. Now, it doesn't open the video. I have to do it again each time I open the SRT.

It concerns any codec / container.
Lucius Snow is offline   Reply With Quote
Old 20th March 2020, 07:08   #843  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 282
@Lucius Snow: Do you still have problems in latest beta: https://github.com/SubtitleEdit/subt...leEditBeta.zip ?
If yes, what are the steps for re-creating this in detail (drag-n-drop or file open or shortcuts)?
Nikse555 is offline   Reply With Quote
Old 20th March 2020, 11:54   #844  |  Link
GCRaistlin
Registered User
 
GCRaistlin's Avatar
 
Join Date: Jun 2006
Posts: 302
Quote:
Originally Posted by Nikse555 View Post
some examples where first letters are converted wrongly to uppercase via hard coded rules
Code:
92
00:07:54,641 --> 00:07:56,559
- l would take the idea to its extreme -
- [ Whispering, lndistinct ]

93
00:07:56,643 --> 00:08:00,188
and draw parallels
between reproduction in art. . .

577
00:38:09,245 --> 00:38:13,875
- Well, he said that, uh -
- lt is actually as beautiful as the original.

578
00:38:14,000 --> 00:38:17,295
- that they thought it was an original
for many, many centuries -
- [ Man, ln ltalian ] When was it made?

1091
01:12:28,344 --> 01:12:32,014
- That impression is quite right, but. . .
- [ Crowd Cheering ]

1092
01:12:32,181 --> 01:12:33,808
how can l say. . .
See ## 93, 578, 1092. Anyway, "OCR error" means that something is erroneously recognized while the first letter may be in lower case in an original caption. It may be an error in a general sense, but if we just want to get the text that is identical to the graphical source we surely don't want such AI.

Quote:
Originally Posted by Nikse555 View Post
About the profile on General tab... click the "..." button the make new or delete profiles.
Oh I see, how could I just miss it. But searching for Settings.xml in the current directory first would be still useful for multi-user environment.

Quote:
Originally Posted by Nikse555 View Post
The double-click on word lists feature are available in latest beta.
Working, thanks. Can you please implement replacing of an existing word on 'Add pair' press (with confirmation)?

Quote:
Originally Posted by Nikse555 View Post
Editing the text in "Remove text for HI could be possible - how does this work
It does, thanks again.
__________________
Magically yours
Raistlin
GCRaistlin is online now   Reply With Quote
Old 20th March 2020, 13:08   #845  |  Link
Lucius Snow
Registered User
 
Join Date: Oct 2003
Location: Paris, France
Posts: 92
Quote:
Originally Posted by Nikse555 View Post
@Lucius Snow: Do you still have problems in latest beta: https://github.com/SubtitleEdit/subt...leEditBeta.zip ?
If yes, what are the steps for re-creating this in detail (drag-n-drop or file open or shortcuts)?
Thank you but there's no change.
Lucius Snow is offline   Reply With Quote
Old 20th March 2020, 13:35   #846  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 282
Quote:
Originally Posted by Lucius Snow View Post
Thank you but there's no change.
But can you give steps to re-create this issue?
Nikse555 is offline   Reply With Quote
Old 20th March 2020, 14:29   #847  |  Link
Lucius Snow
Registered User
 
Join Date: Oct 2003
Location: Paris, France
Posts: 92
Quote:
Originally Posted by Nikse555 View Post
But can you give steps to re-create this issue?
That's what I described earlier:

My problem is just when I open an existing SRT from File / Reopen menu. Before the update, it opened both the SRT and the associated video file. Now, it doesn't open the video. I have to do it again each time I open the SRT.

Difficult to explain more
Lucius Snow is offline   Reply With Quote
Old 20th March 2020, 22:28   #848  |  Link
GCRaistlin
Registered User
 
GCRaistlin's Avatar
 
Join Date: Jun 2006
Posts: 302
Nikse555
  1. Quote:
    Now in List view Start time and Duration are displayed and available for editing. In some cases, editing of End time is preferable. Can you please add such a possibility?
    In addition to my previous request I'm offering to add 'Pause before next' field. The result could look like this:
    Code:
    [x] Start time: ___   [ ] Duration:          ___
    [ ] End time:   ___   [x] Pause before next: ___
    The idea is that 0, 1 or 2 checkboxes can be set at the same time. Inactive fields (related to cleared checkboxes) are greyed out, their values get changed in accordance to the values in active fields.
  2. This bug isn't reproducible with Latin.db in the latest beta but I decided to report it 'cause it is really strange. Try to OCR these SUP(BD) subtitles with 3.5.14 (clear all checkboxes but [x] Fix OCR errors). The problematic caption is #203. If you start from #100 (skip unrecognized characters twice) "Sunday" will be recognized as "sunday". If you start from #101 (close current OCR session and start a new one) it will be recognized as "Sunday".
  3. Currently, the installation package contains files that may be changed by an user (Latin.db, en_US_user.xml and so on). Hence, they may be replaced with default ones by update. It would be better if all changes were made to the files that don't exist by default.

My current Latin.db - maybe you'll find my additions useful for all.
__________________
Magically yours
Raistlin
GCRaistlin is online now   Reply With Quote
Old 21st March 2020, 10:31   #849  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 1,426
@Nikse555

I am doing some OCR on idx+sub files and every "I" that begins a sentence is converted to "L".

Can you fix that?

Here is a sample.
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 29th March 2020, 11:31   #850  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 1,426
@Nikse555

I am finding some encoding giving problems to your editor.

Here you can find some. I hope the names are self explanatory enough.

The only ones I can open with no problems on accented vowels and symbols are UTF16-LE ones. With UTF8 it is a mess
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 29th March 2020, 12:08   #851  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 282
@tormento: I don't think those files are correctly encoded, sorry.
EDIT: The UTF-8 files have UTF-8 BOM (EF BB BF) but they are not using UTF-8 encoding, they are ANSI encoded! Yes, really a mess

Last edited by Nikse555; 29th March 2020 at 13:34.
Nikse555 is offline   Reply With Quote
Old 30th March 2020, 12:28   #852  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 1,426
Quote:
Originally Posted by Nikse555 View Post
@tormento: I don't think those files are correctly encoded, sorry. EDIT: The UTF-8 files have UTF-8 BOM (EF BB BF) but they are not using UTF-8 encoding, they are ANSI encoded! Yes, really a mess
They come from Sub Rip, latest version. I dunno if author is still active to report him this mess.

What about the other message about "L" OCR?
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 30th March 2020, 14:41   #853  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 282
Quote:
Originally Posted by tormento View Post
What about the other message about "L" OCR?
I've updated latest beta somewhat: https://github.com/SubtitleEdit/subt...leEditBeta.zip

Your subtitle runs very well through the OCR using "Binary image compare" with number-of-pixels-is-space=7 and max-error-pct=1.
I did not have any problems with "L".

What OCR method are you using and what lines are problematic?

Last edited by Nikse555; 31st March 2020 at 05:50.
Nikse555 is offline   Reply With Quote
Old 31st March 2020, 12:57   #854  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 1,426
Quote:
Originally Posted by Nikse555 View Post
Your subtitle runs very well through the OCR using "Binary image compare" with number-of-pixels-is-space=7 and max-error-pct=1. I did not have any problems with "L".
Clean installation, same settings of yours. I drop idx to SE, no OCR auto correction enabled.

Many many "L":

570
00:51:39,520 --> 00:51:42,478
L know the book is tough,
but l liked it.

571
00:51:42,600 --> 00:51:43,476
L know.

Tried with a fresh installation and "untrained" OCR database?
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 31st March 2020, 14:32   #855  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 282
Quote:
Originally Posted by tormento View Post
Clean installation, same settings of yours. I drop idx to SE, no OCR auto correction enabled.

Many many "L"...
I get "l" (lowercase L) instead of "I" (uppercase i)... because the two images are exactly alike. Enabling "Fix OCR errors" should fix those...

Result here, starting with lowercase "L":
Code:
l know the book is tough,
but l liked it.
Nikse555 is offline   Reply With Quote
Old 31st March 2020, 15:20   #856  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 1,426
Quote:
Originally Posted by Nikse555 View Post
I get "l" (lowercase L) instead of "I" (uppercase i)... because the two images are exactly alike. Enabling "Fix OCR errors" should fix those...



Result here, starting with lowercase "L":

Code:
l know the book is tough,

but l liked it.


I get capital L!
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 2nd April 2020, 18:15   #857  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Hollola, Finland
Posts: 4,804
Quote:
Originally Posted by tormento View Post
I get capital L!
I've added the OCR fix list pair (Options -> Settings -> Word lists) l --> I to fix this, if I remember correctly. You can also do that after the OCR run.

That subtitle example was very straightforward to OCR and the characters look like most DVDs, so you get a lot of good matches for future subs.

I uploaded my dictionary files and latin.db in case someone finds them useful (Nikse555 can freely use the content with SE if he wants to):

https://drive.google.com/open?id=1Bo...8tAwXRn-IWRgnz
https://drive.google.com/open?id=1Nz...CBOW9K93DZVxHb
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 3rd April 2020, 11:07   #858  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 1,426
One of the best things of SubRip was the possibility to save different matrixes and automatically scan for the most effective one on OCR when loading the sub bitmap file.

That gives the possibility not to pollute the good trained ones with some unusual subtitle, plus the possibility to organize them effectively.

Moreover, Subtitle Edit could come with pretrained ones like SupRip did for the most commonly used fonts.

@nikse would you, please?
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 3rd April 2020, 11:14   #859  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Hollola, Finland
Posts: 4,804
SE does have the ability to use different databases for OCR and the scanning could be useful, I agree.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 3rd April 2020, 11:21   #860  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 1,426
Quote:
Originally Posted by Boulder View Post
SE does have the ability to use different databases for OCR and the scanning could be useful, I agree.


Another missing thing is the possibility to expand selection, such as for % that sometimes goes wrong on OCR. A point and click expansion thing would be even better.
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 18:25.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, vBulletin Solutions Inc.