View Single Post
Old 21st May 2020, 16:50   #1009  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 428
Quote:
Originally Posted by Janusz View Post
Beta 129.
For this example I created a sup file from the text "the tragedy ofAlabama" where "the tragedy of" I marked italic.

I only installed the following dictionaries: French, German, Italian, English without additional OCRFixReplaceList.xml files.
For: French, German, Italian, English - the patch works ok. The text after OCR looks like this: "<i>the tragedy of</i> Alabama",
for: Polish and "none" like this: "<i>the tragedy</i> ofAlabama".
I did not check others, but I think the amendment should work in all languages because the word "ofAlabama" is not correct in any language, and any division in this case may occur between italics/regular or regular/italics always regardless of the language chosen how many new words exist in the selected dictionary. Example from Poland: "fotografAdam" (photographer Adam).
Yes, the OCR process benefits from a good OCR fix replace list.
I've added a Polish one based on your input here: https://github.com/SubtitleEdit/subt...eplaceList.xml
Feel free to add to it
Nikse555 is offline   Reply With Quote