Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Subtitles

Reply
 
Thread Tools Search this Thread Display Modes
Old 2nd June 2020, 17:43   #1041  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 428
Quote:
Originally Posted by Melan View Post
thx for re-testing, could you supply the steps to re-create the crash?
Nikse555 is offline   Reply With Quote
Old 2nd June 2020, 19:38   #1042  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 428
Beta updated: https://github.com/SubtitleEdit/subt...leEditBeta.zip
Now extended chars in nOCR can also be edited/deleted.

Please give the new "nOCR" a go
It's based on lines rather than images, so it works better with scaling than "Binary image compare". Works best with larger fonts. Can be "auto trained" with your own supplied letters/language + fonts (Ctrl+T in OCR window starts training window)
"Binary image compare" can also be combined with a fallback-to-nOCR.
Nikse555 is offline   Reply With Quote
Old 2nd June 2020, 21:07   #1043  |  Link
kerry7
Registered User
 
Join Date: May 2020
Posts: 13
I just saw that on the Github repo, they have committed an .exe file... that really hurts
__________________
Techy lover addicted to Raspberry Pi

Last edited by kerry7; 2nd August 2020 at 17:37.
kerry7 is offline   Reply With Quote
Old 2nd June 2020, 21:16   #1044  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 428
Quote:
Originally Posted by kerry7 View Post
I just saw that on the Github repo, they have committed an .exe file... that really hurts
Where?

EDIT: It's totally normal to include 3rd party software as binaries... but "Subtitle Edit" should be committed as source (I've seen a few project where they ONLY committed the .exe file - now that's scary!)

Last edited by Nikse555; 2nd June 2020 at 21:24.
Nikse555 is offline   Reply With Quote
Old 2nd June 2020, 23:57   #1045  |  Link
Janusz
Registered User
 
Join Date: Apr 2020
Location: Poland
Posts: 143
@ Nikse555



"Batman" problem. Sup file to download.

The file can be read using Latain.nocr or train yourself a new character set only for arial 65/100, and then it will be perfect.
To show what I have a problem with, I selected several correct lines and several lines from the original file.
I also added 4 lines from "Ź" that are not in the original file.
Due to the character set used and OCR errors, the resulting image may differ so I will explain:

- good lines are: 1, 2, 7, 8, 9, 10, 12, 13. Why? Certainly not because Ż, Ź, Ś are only in the top line, but I explain it to myself.

- problematic lines are: 3, 4, 5, 6, 11, and 14. Here Ż, Ś, Ź is in the bottom line. The place where "*" appears depends on which line is longer.
Sometimes it is the beginning of the line, other times we additionally lose the character from the top line.
In these lines, when the [Draw missing texts] option is enabled, OCR calls for a character,
but not for the capital letter with the index, that is: Ż, Ś, Ź, and only for the index itself.

- finally the "pearl" line 15. Characters with the index are in both the top and bottom line, and yet the line was read correctly.

I understand why this is happening and I think it can be solved.
__________________
Sorry for my mistakes - I'm using a translator.
Janusz is offline   Reply With Quote
Old 3rd June 2020, 01:31   #1046  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,542
@Nikse555

Is there any format that allows OCR recognition of both upper and lower screen text (tipically anime)?

I have tried to set .ass in the main window but when OCR founds both upper and lower screen text, it skips the formatting, while it works when it's only on upper part (.srt works too).

If it's a limitation of OCR, would it be possible to add the feature?

Example.

Please notice that some negative values for subtitles are present too, perhaps because OCR doesn't know how to manage upper and lower screen text at the same time.
__________________
@turment on Telegram

Last edited by tormento; 3rd June 2020 at 02:48.
tormento is offline   Reply With Quote
Old 3rd June 2020, 06:46   #1047  |  Link
Melan
Registered User
 
Melan's Avatar
 
Join Date: Jan 2014
Location: Poland
Posts: 64
Quote:
Originally Posted by Nikse555 View Post
thx for re-testing, could you supply the steps to re-create the crash?
Nothing special (I hope) . I run SE, parse the file and run the nocr module.
I sent the files by mail.
Melan is offline   Reply With Quote
Old 3rd June 2020, 08:50   #1048  |  Link
Janusz
Registered User
 
Join Date: Apr 2020
Location: Poland
Posts: 143
@tormento

This is what it consists of:

Code:
1
00:00:01,000 --> 01:00:01,000
{\an8}Żeby budzić strach w innych,
musisz zapanować nad własnym.

2
00:00:06,209 --> 00:00:09,163
Żeby pokonać strach,
trzeba się nim stać.

3
00:00:10,163 --> 00:00:13,782
Wiesz, czemu upadamy?
Żebyśmy mogli się pozbierać.

4
00:00:14,782 --> 00:00:18,474
Podobno twój tata błagał o litość.
Żebrał jak pies.
__________________
Sorry for my mistakes - I'm using a translator.

Last edited by Janusz; 3rd June 2020 at 08:55.
Janusz is offline   Reply With Quote
Old 3rd June 2020, 08:56   #1049  |  Link
kerry7
Registered User
 
Join Date: May 2020
Posts: 13
Quote:
Originally Posted by Nikse555 View Post
Where?

EDIT: It's totally normal to include 3rd party software as binaries... but "Subtitle Edit" should be committed as source (I've seen a few project where they ONLY committed the .exe file - now that's scary!)
On the root of the project, the file is `vswhere.exe`. And it is a bit confusing because the last tag of the project is around 3.0.X, however, the comment of the commit says Update `wswhere to 2.3.2`, what it that means? (just for curiosity, would like to learn)
__________________
Techy lover addicted to Raspberry Pi

Last edited by kerry7; 2nd August 2020 at 17:36.
kerry7 is offline   Reply With Quote
Old 3rd June 2020, 11:12   #1050  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,542
Quote:
Originally Posted by Janusz View Post
This is what it consists of
Try to ocr my sup.

SE can't really do a good job and mixes upper with lower when both are present.
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 3rd June 2020, 19:02   #1051  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 428
Quote:
Originally Posted by Melan View Post
Nothing special (I hope) . I run SE, parse the file and run the nocr module.
I sent the files by mail.
Thx for the info + files
Yes, I got the error too - now fixed in latest beta: https://github.com/SubtitleEdit/subt...leEditBeta.zip


@kerry7:
"vswhere" is a small tool that helps (in exe form) to compile Subtitle Edit: https://github.com/microsoft/vswhere
"vswhere" was version "2.3.2"... which has nothing to do with the SE version number. I just updated "vswhere" to 2.8.4 - see https://github.com/SubtitleEdit/subt...4aa29b1a33e13c


@tormento:
Sorry, SE does not support this (besides all text at top).
This is pretty complex - text can be all over and even vertical.
Nikse555 is offline   Reply With Quote
Old 3rd June 2020, 20:29   #1052  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,542
Quote:
Originally Posted by Nikse555 View Post
Sorry, SE does not support this (besides all text at top). This is pretty complex - text can be all over and even vertical.
It would be more than enough support overlap subtitles with top and bottom.
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 4th June 2020, 17:16   #1053  |  Link
Melan
Registered User
 
Melan's Avatar
 
Join Date: Jan 2014
Location: Poland
Posts: 64
Quote:
Originally Posted by Nikse555 View Post
Thx for the info + files
Yes, I got the error too - now fixed in latest beta: https://github.com/SubtitleEdit/subt...leEditBeta.zip

(...)
I parsed a few files and the error didn't appear.
However, something strange happened. SE doesn't choose Polish characters.

https://i.imgur.com/IZGtqQF.png

Edit.
After the restart, everything returned to normal.

Last edited by Melan; 4th June 2020 at 17:21.
Melan is offline   Reply With Quote
Old 4th June 2020, 23:57   #1054  |  Link
Janusz
Registered User
 
Join Date: Apr 2020
Location: Poland
Posts: 143
@ Nikse555
In Beta 203 import of nOCR character database does not work.
The last one where imports were still active was Beta 194.
__________________
Sorry for my mistakes - I'm using a translator.
Janusz is offline   Reply With Quote
Old 5th June 2020, 09:20   #1055  |  Link
Melan
Registered User
 
Melan's Avatar
 
Join Date: Jan 2014
Location: Poland
Posts: 64
When two characters from both lines are interpreted as one letter then the initial dash always turns into a dot.
https://i.imgur.com/WFhQbb7.png
https://i.imgur.com/D9vXeK8.png


Maybe I'm blind :P, but I really don't see the difference between zero after digit 6 and zero after digit 1.
https://i.imgur.com/oMcXc0a.png

Last edited by Melan; 5th June 2020 at 09:43.
Melan is offline   Reply With Quote
Old 5th June 2020, 09:54   #1056  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 428
@Janusz: The nOCR import should be fixed now, thx
Also, I'm testig a new line splitter - how does that work for you? It will never be perfect...
Latest beta: https://github.com/SubtitleEdit/subt...leEditBeta.zip

@Melan:
Could you post or email the subtitle file? (you can e.g. right-click on the ocr-window and export as blu-ray sup)
About the "O"... you have to "Add better match" and enter "0"...
Nikse555 is offline   Reply With Quote
Old 5th June 2020, 10:30   #1057  |  Link
Melan
Registered User
 
Melan's Avatar
 
Join Date: Jan 2014
Location: Poland
Posts: 64
Quote:
Originally Posted by Nikse555 View Post
Could you post or email the subtitle file? (you can e.g. right-click on the ocr-window and export as blu-ray sup)
Unfortunately, I can't. Screen 20.05 - chat.


I downloaded the B208 version and ...
https://i.imgur.com/6P8JvB2.png

Last edited by Melan; 5th June 2020 at 10:49.
Melan is offline   Reply With Quote
Old 5th June 2020, 13:25   #1058  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 428
@Melan: OK, got the crash too... should be fixed here: https://github.com/SubtitleEdit/subt...leEditBeta.zip

@Janusz: Also, made some fixes (hopefully) to the new line-splitter in above beta too.
Nikse555 is offline   Reply With Quote
Old 5th June 2020, 15:05   #1059  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 428
And due to some bugs in the new image line splitter... a new beta: https://github.com/SubtitleEdit/subt...leEditBeta.zip
Nikse555 is offline   Reply With Quote
Old 5th June 2020, 15:47   #1060  |  Link
Janusz
Registered User
 
Join Date: Apr 2020
Location: Poland
Posts: 143
Quote:
Originally Posted by Nikse555 View Post
And due to some bugs in the new image line splitter... a new beta: https://github.com/SubtitleEdit/subt...leEditBeta.zip
Beta 8 already worked well, but it crashed on line 174 with the _index.html file from the "Batman Begins" directory and the character database added with the file. I do not know why? Beta 12 passes through this line without failure and feels flawlessly.
[Draw Missing texts] was awarded. If [Draw Missing texts] is checked, on line 74 OCR will call for ",". This sign is strangely marked in the top window, although the image at the bottom is correct. It looks the same in Beta 12.
Well done, thank you.
-----
Edit 1:
As we are at Batman, please note what is happening now with the 758 line. Earlier versions did not do that.
If the error cannot be reproduced, I will insert a picture. It looks like some noise picked up by OCR.
-----
I would add that in 1137 images it appears in this one. I also did an OCR file that consists of 5489 images and nothing like this ever happened. OCR by importing images only from this one image does not generate an error.

Edit 2:
Just like @Melan showed here:



It's just that the whole sign is visible and I have some scraps of different signs from the bottom line.
__________________
Sorry for my mistakes - I'm using a translator.

Last edited by Janusz; 5th June 2020 at 20:15.
Janusz is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 15:15.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.