Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
2nd June 2020, 19:38 | #1042 | Link |
Registered User
Join Date: Feb 2004
Location: Mars
Posts: 428
|
Beta updated: https://github.com/SubtitleEdit/subt...leEditBeta.zip
Now extended chars in nOCR can also be edited/deleted. Please give the new "nOCR" a go It's based on lines rather than images, so it works better with scaling than "Binary image compare". Works best with larger fonts. Can be "auto trained" with your own supplied letters/language + fonts (Ctrl+T in OCR window starts training window) "Binary image compare" can also be combined with a fallback-to-nOCR. |
2nd June 2020, 21:07 | #1043 | Link |
Registered User
Join Date: May 2020
Posts: 13
|
I just saw that on the Github repo, they have committed an .exe file... that really hurts
__________________
Techy lover addicted to Raspberry Pi Last edited by kerry7; 2nd August 2020 at 17:37. |
2nd June 2020, 21:16 | #1044 | Link | |
Registered User
Join Date: Feb 2004
Location: Mars
Posts: 428
|
Quote:
EDIT: It's totally normal to include 3rd party software as binaries... but "Subtitle Edit" should be committed as source (I've seen a few project where they ONLY committed the .exe file - now that's scary!) Last edited by Nikse555; 2nd June 2020 at 21:24. |
|
2nd June 2020, 23:57 | #1045 | Link |
Registered User
Join Date: Apr 2020
Location: Poland
Posts: 143
|
@ Nikse555
"Batman" problem. Sup file to download. The file can be read using Latain.nocr or train yourself a new character set only for arial 65/100, and then it will be perfect. To show what I have a problem with, I selected several correct lines and several lines from the original file. I also added 4 lines from "Ź" that are not in the original file. Due to the character set used and OCR errors, the resulting image may differ so I will explain: - good lines are: 1, 2, 7, 8, 9, 10, 12, 13. Why? Certainly not because Ż, Ź, Ś are only in the top line, but I explain it to myself. - problematic lines are: 3, 4, 5, 6, 11, and 14. Here Ż, Ś, Ź is in the bottom line. The place where "*" appears depends on which line is longer. Sometimes it is the beginning of the line, other times we additionally lose the character from the top line. In these lines, when the [Draw missing texts] option is enabled, OCR calls for a character, but not for the capital letter with the index, that is: Ż, Ś, Ź, and only for the index itself. - finally the "pearl" line 15. Characters with the index are in both the top and bottom line, and yet the line was read correctly. I understand why this is happening and I think it can be solved.
__________________
Sorry for my mistakes - I'm using a translator. |
3rd June 2020, 01:31 | #1046 | Link |
Acid fr0g
Join Date: May 2002
Location: Italy
Posts: 2,542
|
@Nikse555
Is there any format that allows OCR recognition of both upper and lower screen text (tipically anime)? I have tried to set .ass in the main window but when OCR founds both upper and lower screen text, it skips the formatting, while it works when it's only on upper part (.srt works too). If it's a limitation of OCR, would it be possible to add the feature? Example. Please notice that some negative values for subtitles are present too, perhaps because OCR doesn't know how to manage upper and lower screen text at the same time.
__________________
@turment on Telegram Last edited by tormento; 3rd June 2020 at 02:48. |
3rd June 2020, 08:50 | #1048 | Link |
Registered User
Join Date: Apr 2020
Location: Poland
Posts: 143
|
@tormento
This is what it consists of: Code:
1 00:00:01,000 --> 01:00:01,000 {\an8}Żeby budzić strach w innych, musisz zapanować nad własnym. 2 00:00:06,209 --> 00:00:09,163 Żeby pokonać strach, trzeba się nim stać. 3 00:00:10,163 --> 00:00:13,782 Wiesz, czemu upadamy? Żebyśmy mogli się pozbierać. 4 00:00:14,782 --> 00:00:18,474 Podobno twój tata błagał o litość. Żebrał jak pies.
__________________
Sorry for my mistakes - I'm using a translator. Last edited by Janusz; 3rd June 2020 at 08:55. |
3rd June 2020, 08:56 | #1049 | Link |
Registered User
Join Date: May 2020
Posts: 13
|
On the root of the project, the file is `vswhere.exe`. And it is a bit confusing because the last tag of the project is around 3.0.X, however, the comment of the commit says Update `wswhere to 2.3.2`, what it that means? (just for curiosity, would like to learn)
__________________
Techy lover addicted to Raspberry Pi Last edited by kerry7; 2nd August 2020 at 17:36. |
3rd June 2020, 19:02 | #1051 | Link | |
Registered User
Join Date: Feb 2004
Location: Mars
Posts: 428
|
Quote:
Yes, I got the error too - now fixed in latest beta: https://github.com/SubtitleEdit/subt...leEditBeta.zip @kerry7: "vswhere" is a small tool that helps (in exe form) to compile Subtitle Edit: https://github.com/microsoft/vswhere "vswhere" was version "2.3.2"... which has nothing to do with the SE version number. I just updated "vswhere" to 2.8.4 - see https://github.com/SubtitleEdit/subt...4aa29b1a33e13c @tormento: Sorry, SE does not support this (besides all text at top). This is pretty complex - text can be all over and even vertical. |
|
4th June 2020, 17:16 | #1053 | Link | |
Registered User
Join Date: Jan 2014
Location: Poland
Posts: 64
|
Quote:
However, something strange happened. SE doesn't choose Polish characters. https://i.imgur.com/IZGtqQF.png Edit. After the restart, everything returned to normal. Last edited by Melan; 4th June 2020 at 17:21. |
|
5th June 2020, 09:20 | #1055 | Link |
Registered User
Join Date: Jan 2014
Location: Poland
Posts: 64
|
When two characters from both lines are interpreted as one letter then the initial dash always turns into a dot.
https://i.imgur.com/WFhQbb7.png https://i.imgur.com/D9vXeK8.png Maybe I'm blind :P, but I really don't see the difference between zero after digit 6 and zero after digit 1. https://i.imgur.com/oMcXc0a.png Last edited by Melan; 5th June 2020 at 09:43. |
5th June 2020, 09:54 | #1056 | Link |
Registered User
Join Date: Feb 2004
Location: Mars
Posts: 428
|
@Janusz: The nOCR import should be fixed now, thx
Also, I'm testig a new line splitter - how does that work for you? It will never be perfect... Latest beta: https://github.com/SubtitleEdit/subt...leEditBeta.zip @Melan: Could you post or email the subtitle file? (you can e.g. right-click on the ocr-window and export as blu-ray sup) About the "O"... you have to "Add better match" and enter "0"... |
5th June 2020, 10:30 | #1057 | Link | |
Registered User
Join Date: Jan 2014
Location: Poland
Posts: 64
|
Quote:
I downloaded the B208 version and ... https://i.imgur.com/6P8JvB2.png Last edited by Melan; 5th June 2020 at 10:49. |
|
5th June 2020, 13:25 | #1058 | Link |
Registered User
Join Date: Feb 2004
Location: Mars
Posts: 428
|
@Melan: OK, got the crash too... should be fixed here: https://github.com/SubtitleEdit/subt...leEditBeta.zip
@Janusz: Also, made some fixes (hopefully) to the new line-splitter in above beta too. |
5th June 2020, 15:05 | #1059 | Link |
Registered User
Join Date: Feb 2004
Location: Mars
Posts: 428
|
And due to some bugs in the new image line splitter... a new beta: https://github.com/SubtitleEdit/subt...leEditBeta.zip
|
5th June 2020, 15:47 | #1060 | Link | |
Registered User
Join Date: Apr 2020
Location: Poland
Posts: 143
|
Quote:
[Draw Missing texts] was awarded. If [Draw Missing texts] is checked, on line 74 OCR will call for ",". This sign is strangely marked in the top window, although the image at the bottom is correct. It looks the same in Beta 12. Well done, thank you. ----- Edit 1: As we are at Batman, please note what is happening now with the 758 line. Earlier versions did not do that. If the error cannot be reproduced, I will insert a picture. It looks like some noise picked up by OCR. ----- I would add that in 1137 images it appears in this one. I also did an OCR file that consists of 5489 images and nothing like this ever happened. OCR by importing images only from this one image does not generate an error. Edit 2: Just like @Melan showed here: It's just that the whole sign is visible and I have some scraps of different signs from the bottom line.
__________________
Sorry for my mistakes - I'm using a translator. Last edited by Janusz; 5th June 2020 at 20:15. |
|
Thread Tools | Search this Thread |
Display Modes | |
|
|