Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Subtitles

Reply
 
Thread Tools Search this Thread Display Modes
Old 29th June 2020, 08:02   #1141  |  Link
Janusz
Registered User
 
Join Date: Apr 2020
Location: Poland
Posts: 143
@GCRaistlin
Use the US English dictionary for OCR, select [Fix OCR errors] and [Try to quess unknow words] as a result of which you will receive your
Code:
 I'll <i>vafangool</i> you!
__________________
Sorry for my mistakes - I'm using a translator.
Janusz is offline   Reply With Quote
Old 29th June 2020, 10:17   #1142  |  Link
GCRaistlin
Registered User
 
GCRaistlin's Avatar
 
Join Date: Jun 2006
Posts: 350
Janusz
What does it have to do with the reported issue? This time your trick helps (maybe, I didn't check), next time it won't.
__________________
Windows 8.1 x64

Magically yours
Raistlin
GCRaistlin is offline   Reply With Quote
Old 30th June 2020, 08:55   #1143  |  Link
varekai
Registered User
 
varekai's Avatar
 
Join Date: Jul 2006
Posts: 528
I'll vafangool you!
varekai is offline   Reply With Quote
Old 11th July 2020, 08:34   #1144  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,542
Quote:
Originally Posted by Nikse555 View Post
Could you provide a image/sup so I can try it?
I saw you updated beta but you never replied to my post.
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 16th July 2020, 17:28   #1145  |  Link
jlw_4049
Registered User
 
Join Date: Sep 2018
Posts: 391
Still having major issues with music notes in the latest beta version for tesseract/binary.

http://www.mediafire.com/file/2pyfyn...ample.sup/file

There is a file that I've had the issues on.
jlw_4049 is offline   Reply With Quote
Old 18th July 2020, 09:05   #1146  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 428
@Tormento: I've tested your sup file and it works fine... I don't get the strange replacements that you get, so you should probably do a clean install (delete all old SE files before - including those in %appdata%\Subtitle Edit).
EDIT: Also, latest beta has improved casing in OCR for italian letter "Ú": https://github.com/SubtitleEdit/subt...leEditBeta.zip

@jwl_4049: You should request better support for music symbols for tesseract here: https://github.com/SubtitleEdit/subt...leEditBeta.zip
Or you could try "nOCR" or "Binary image compare"...

Last edited by Nikse555; 18th July 2020 at 09:07.
Nikse555 is offline   Reply With Quote
Old 18th July 2020, 10:00   #1147  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,542
Quote:
Originally Posted by Nikse555 View Post
I've tested your sup file and it works fine.
You are right about "Es", I have found it in ita_OCRFixReplaceList_User.xml somehow...

The "i" issue comes from ita_OCRFixReplaceList.xml, where "l" is replaced by "i" and sometimes a "I" is OCR as "l".

Quote:
Originally Posted by Nikse555 View Post
Also, latest beta has improved casing in OCR for italian letter "Ú"
We don't have any Ú letter in Italian.

We do have ù and Ù.
__________________
@turment on Telegram

Last edited by tormento; 18th July 2020 at 10:32.
tormento is offline   Reply With Quote
Old 19th July 2020, 06:35   #1148  |  Link
loninapleton
Registered User
 
Join Date: Jun 2019
Posts: 60
A simple save from ASS to SRT

Not hijacking anything. I just need to know if this is the major forum discussion for Subtitle Edit. I am just beginning to do translations. Some online tools are available. But my current need is to get an ASS file which is translated English to Polish saved as SRT or VOB that is recognized by MKVToolnix.

I'm only getting a text save in Subtitle Edit. Please give the steps of getting this kind of save. And thank you for this amazing tool.
loninapleton is offline   Reply With Quote
Old 19th July 2020, 07:21   #1149  |  Link
jlw_4049
Registered User
 
Join Date: Sep 2018
Posts: 391
Quote:
Originally Posted by Nikse555 View Post

@jlw_4049: You should request better support for music symbols for tesseract here: https://github.com/SubtitleEdit/subt...leEditBeta.zip
Or you could try "nOCR" or "Binary image compare"...
Im not sure what nOCR is. All I see is binary or tesseract.

I'll look more into it tomorrow when I get off.

Sent from my SM-G986U1 using Tapatalk
jlw_4049 is offline   Reply With Quote
Old 19th July 2020, 08:32   #1150  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 428
@tormento: I could not find lines where "l" is replaced by "i"... could you give some line numbers? (thx about the italic letter accent U)

@loninapleton: You can open the ASS file and change format in the toolbar to "SubRip (.srt)" (SubRip is the topmost format in the drop down list).
You can also convert multiple ASS files to SubRip (.srt) via Tools -> Batch convert or by using command line convert.

@jlw_4049: If you cannot see the OCR method "nOCR" then you probably don't use SE 3.5.16?
Nikse555 is offline   Reply With Quote
Old 19th July 2020, 09:38   #1151  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,542
Quote:
Originally Posted by Nikse555 View Post
could you give some line numbers?
Line 33 of ita_OCRFixReplaceList.xml
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 19th July 2020, 10:25   #1152  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 428
Quote:
Originally Posted by tormento View Post
Line 33 of ita_OCRFixReplaceList.xml
Do you also have a line number in .sup file?
Nikse555 is offline   Reply With Quote
Old 19th July 2020, 12:15   #1153  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,542
Quote:
Originally Posted by Nikse555 View Post
Do you also have a line number in .sup file?
1557

Even removing the OCR line that I told you, it's wrongly OCRing "I" as "i".
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 19th July 2020, 13:41   #1154  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 428
Quote:
Originally Posted by tormento View Post
1557

Even removing the OCR line that I told you, it's wrongly OCRing "I" as "i".
I get
Code:
INDIGNAZIONE: I CINQUE MOTIVI
PER CUl O.J. SIMPSON SE L'É CAVATA
Nikse555 is offline   Reply With Quote
Old 19th July 2020, 13:44   #1155  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,542
Quote:
Originally Posted by Nikse555 View Post
I get
Code:
INDIGNAZIONE: I CINQUE MOTIVI
PER CUl O.J. SIMPSON SE L'É CAVATA

WTF.

Apart from wrong É (it should be È) it looks like your OCR hasn’t my same issue.

Need to sort this thing out.

Perhaps some regional setting? I had problems with an AVS script some time ago.
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 19th July 2020, 13:48   #1156  |  Link
jlw_4049
Registered User
 
Join Date: Sep 2018
Posts: 391
Quote:
Originally Posted by Nikse555 View Post
@tormento: I could not find lines where "l" is replaced by "i"... could you give some line numbers? (thx about the italic letter accent U)

@loninapleton: You can open the ASS file and change format in the toolbar to "SubRip (.srt)" (SubRip is the topmost format in the drop down list).
You can also convert multiple ASS files to SubRip (.srt) via Tools -> Batch convert or by using command line convert.

@jlw_4049: If you cannot see the OCR method "nOCR" then you probably don't use SE 3.5.16?
I downloaded the latest BETA recently. Maybe I need to delete everything and replace it.

Sent from my SM-G986U1 using Tapatalk
jlw_4049 is offline   Reply With Quote
Old 19th July 2020, 18:52   #1157  |  Link
Janusz
Registered User
 
Join Date: Apr 2020
Location: Poland
Posts: 143
@jlw_4049
Here I wrote what needs to be done to access nOCR https://forum.doom9.org/showthread.p...45#post1913645
__________________
Sorry for my mistakes - I'm using a translator.
Janusz is offline   Reply With Quote
Old 19th July 2020, 22:56   #1158  |  Link
loninapleton
Registered User
 
Join Date: Jun 2019
Posts: 60
Quote:
Originally Posted by Nikse555 View Post
@tormento: I could not find lines where "l" is replaced by "i"... could you give some line numbers? (thx about the italic letter accent U)

@loninapleton: You can open the ASS file and change format in the toolbar to "SubRip (.srt)" (SubRip is the topmost format in the drop down list).
You can also convert multiple ASS files to SubRip (.srt) via Tools -> Batch convert or by using command line convert.
Thank you. I knew I was missing something-- the tool bar part. I'll try it again. I must be in the right place. :-)
loninapleton is offline   Reply With Quote
Old 20th July 2020, 10:21   #1159  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,542
Quote:
Originally Posted by Nikse555 View Post
I get
Code:
INDIGNAZIONE: I CINQUE MOTIVI PER CUl O.J. SIMPSON SE L'É CAVATA
Ok, it was enough to delete *user*.xml and install last beta.
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 20th July 2020, 16:03   #1160  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,542
Update: I have tried to write a single line srt manually and to use Fix common errors on it, excluding the OCR process.

I have found that if I save "i" letter to Names (it_names_user.xml), SubtitleEdit wants to change "I" to "i".

Usually I save single letter words "i", "a", etc to Names because they can't be found in Italian dictionary and stops the OCR processing.

Any idea to solve this issue?
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 20:40.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.