View Single Post
Old 21st May 2020, 21:57   #1010  |  Link
Janusz
Registered User
 
Join Date: Apr 2020
Location: Poland
Posts: 143
Quote:
Originally Posted by Nikse555 View Post
Yes, the OCR process benefits from a good OCR fix replace list.
Quote:
<WordPart from = "f" to = "f " /> <! - "f" will be two words ->
I had this line in <OCRFixReplaceList> so there had to be something else here.
In the first version of the file, the line contained only one phrase "photographerAdam".
"Adam" is only 5 letters, I thought maybe this is it?

I have created a new file. I added a few lines and longer words starting with "A".



OCR worked, but as you can see above - not quite.
The division has happened, but </i> it is not everywhere it should be. Only on line 2 is good.
I disabled split after "f" in "pol_OCRFixReplaceList.xml". The effect of this is at the bottom.
The division is correct, </i> is where it should be, also on line 1.

Conclusion: The rare case of such a combination of words means that we have to choose ourselves:
enable or disable this option and when in our "OCRFixReplaceList.xml",
because we can do more damage than it is worth.

If you really don't have anything to do, you could look into the source, because changing the dictionary repeatedly to any one installed
and each time OCR with a new dictionary causes that what now looks so nice at the bottom will look like at the top again.
Only starting OCR restores order again.
I know that nobody will mix dictionaries under normal use, but the problem is.
__________________
Sorry for my mistakes - I'm using a translator.
Janusz is offline   Reply With Quote