View Single Post
Old 30th May 2020, 21:23   #1031  |  Link
jlw_4049
Registered User
 
Join Date: Sep 2018
Posts: 391
Quote:
Originally Posted by Janusz View Post
The problem is not about OCR itself. The program changes the character assignment in the character database. Change from ż to Ż, ć to Ć, from z to Z.

Attempt on _index.html file with "Batman Begins".
The character base created only for the first two lines for each new unrecognized character consists of 17 characters: ! ,? a c e h k l o p P R s t z ż
OCR was stopped at "n" on the third line.
As you can see, a small "z" instead of "Z" appeared in the third line.
We can repeatedly start OCR from the first line, each time OCR will stop at "n". Character Database content does not change.
However, if, for example, on the first line we call "Inspect nocr matches for ..." the "nOCR inspekt" window opens and click in the "Inspect items" box, then select OK or Cancel to close the window without any changes.
Reopening this window will show us that "ż" was assigned to "Ż" although we did not. These changes are now saved permanently. Another OCR will show us that "ż" on lines 1 and 2 has been replaced with "Ż" and "z" on "Z" on line 3. The re-OCR is again calling for "ż".
You can also make changes to the characters in the settings yourself

Sent from my SM-G986U1 using Tapatalk
jlw_4049 is offline   Reply With Quote