Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
3rd April 2020, 11:37 | #861 | Link |
Pig on the wing
Join Date: Mar 2002
Location: Finland
Posts: 5,717
|
If the character is not recognized, it's possible to expand. If it's detected wrong, afterwards, still in the OCR dialog you can right click on the line with the issue and choose to inspect the matches. Then right click on the incorrect match and you get an option to select a better multi match. It's something I reported as an issue a long time ago so I just happen to know where it is. It's not easy to come by by accident, the same with all those special characters that you can add using the right click in the OCR dialog where it asks which character the image represents. I've used the software for years and found this one out last week
__________________
And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon... |
3rd April 2020, 12:08 | #863 | Link | |
Pig on the wing
Join Date: Mar 2002
Location: Finland
Posts: 5,717
|
Quote:
EDIT: the problem remains if the first part of % (a dot) is recognized, expansion only works forwards and not backwards. In these cases, I abort the OCR and check the matching for that line manually concerning the incorrect detection and restart the process from there.
__________________
And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon... |
|
3rd April 2020, 12:15 | #865 | Link |
Pig on the wing
Join Date: Mar 2002
Location: Finland
Posts: 5,717
|
It's not in that main dialog. It appears when you start the OCR process, in the bitmap/character matching phase if there is no match.
__________________
And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon... |
3rd April 2020, 12:19 | #867 | Link |
Pig on the wing
Join Date: Mar 2002
Location: Finland
Posts: 5,717
|
And if you already haven't: set/download and set the correct dictionary and enable Fix OCR errors to make your life a bit easier.
__________________
And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon... |
3rd April 2020, 13:15 | #869 | Link |
Pig on the wing
Join Date: Mar 2002
Location: Finland
Posts: 5,717
|
There's two dictionaries, are the equally bad? You can of course modify the dictionaries if there are clear errors. In my opinion, they are the key to getting the whole process fast and accurate but it takes a lot of time in the beginning.
__________________
And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon... |
4th April 2020, 11:14 | #870 | Link |
Acid fr0g
Join Date: May 2002
Location: Italy
Posts: 2,537
|
Ok, I reset Latin.db and started to create a better OCR database, using italic and so.
I think the fact to have to reopen the same IFO for different subs different times is really annoying. Am I doing something wrong?
__________________
@turment on Telegram |
15th April 2020, 19:18 | #871 | Link |
Registered User
Join Date: Sep 2018
Posts: 391
|
Anyway to minimize the program while it's OCR'ing?
Also is there anyway to make the program flash when it has a prompt on the task bar, so you know? Also last question, when updating from older versions, where is the user dictionary/name dictionary saved? That way it's carried over to the newer version? Also, is there anyway to minimize the OCR window? To let it do it's work in the background instead of being forced above all other applications? Thanks! Program is awesome! |
16th April 2020, 10:37 | #872 | Link |
Registered User
Join Date: Feb 2004
Location: Mars
Posts: 428
|
@tormento/Boulder: A few years back I actually tried to programmatically create a Binary-image-compare-db with all windows fonts in 5 different sizes... it's was slow and to my surprise not even very good.
I'm not really sure what the best solution is, but I'm pretty sure it's not one db. Binary image compare and "backward expansion": If you have a difficult character (like "%" where first part is recognized as "o") - you can fix it in the "Inspect compare matches" (dbl click on line in list view), right click in "Inspect items" and choose "Add better multi match". @tormento: >I think the fact to have to reopen the same IFO for different subs different times is really annoying. In the "Choose language" window you can do a "Save as..." for each language stream id. @jlw_4049: To minimize the program while it's OCR'ing: latest beta now has enabled the minimize icon. Latest beta blinks in the taskbar when OCR has a prompt (up to 25 times, but only if OCR window is not focused). Dictionaries are saved in... press Win+R, paste "%appdata%\Subtitle Edit\Dictionaries" and press enter. Most (*not all*) dictionaries have a "_user" file... e.g. the Binary-image-compare-db "latin.db" does not. Please test latest beta https://github.com/SubtitleEdit/subt...leEditBeta.zip (can also use "Tesseract 5 alpha") |
16th April 2020, 11:55 | #873 | Link |
Acid fr0g
Join Date: May 2002
Location: Italy
Posts: 2,537
|
Not necessary to build for every single font. Subtitles manly use arial/helvetica derivatives and for the "strange ones" we can build db on our own. That's why I suggested you to use more than one db, so we can add more characters without "polluting" our standard font sets. Please give a look to SupRip and SubRip. Boh uses more than one set of fonts and the last has the ability to scan thru them to recognize the most fitted.
__________________
@turment on Telegram |
16th April 2020, 15:24 | #875 | Link |
Acid fr0g
Join Date: May 2002
Location: Italy
Posts: 2,537
|
It's you the programmer! You should know!
SupRip uses (you can find the file inside, did you look?):
The least could be to start create multiple databases on our own, with your assurance that SE scans for them or it would be useless.
__________________
@turment on Telegram |
16th April 2020, 18:01 | #877 | Link | |
Registered User
Join Date: Sep 2018
Posts: 391
|
Quote:
Once minimized it cannot be re-opened until it's done. Which isn't a negative for me, but just letting you know! Edit: Is there anyway to do a batch OCR. I had 10 copies of the program last night and it had a memory leak and I had to hard power cycle it. That way I can add multiple files at a time and they get done 1 by 1? Last edited by jlw_4049; 17th April 2020 at 14:34. |
|
20th April 2020, 03:53 | #878 | Link |
Registered User
Join Date: Dec 2006
Posts: 5
|
First time trying
I tried this app out for the first time today. Seems pretty awesome. Thank you so much! My first try reading PGS, I went with default OCR settings, binary image compare (don't do that, results were terrible -- just about every "o" was interpreted as a G). Though almost all text was italic, maybe that was the problem with the bad "
The very first time, it asked me to confirm character by character. Ex: it showed me an "a" and said, "what is this?". I just clicked ok for awhile. It took me a few subtitles before I realized I was telling it every character was "" (the empty string) in text. Then I wondered if I had ruined some dictionary by having it save away those bad values. So I ended up starting again from scratch -- which requires a time-consuming re-load. By the 3rd try I realized Tesseract was much, much better than the default binary image compare. Some questions: 1) Loading a Blu-ray rip file takes several minutes. Any tricks to speed it up? 2) Is there a better (or different) online forum for this software, other than this thread? 3) If a file has two sets of subs (same language), is there a way to work on one, then load up the other, without re-loading the file from scratch? Looking at your comment, "In the "Choose language" window you can do a "Save as..." for each language stream id." --> I didn't see any "save as" under "options-> choose language". Is that where you were talking about? 4) What does a red duration cell indicate? 5) My biggest, worst problem: I load the file, which takes several minutes, then I do OCR on PGS subs and spend a bunch of time fixing them all up, then I hit "OK" to leave the OCR dialog, and .... all my work is gone. The program doesn't crash. It's just that nothing at all shows up in the main window. So I had nothing to show for all my work. It happened to me several times, so I never got any complete srt out of it, except one time when I thought I was going to lose all my work yet again, I got it to write out an SRT with only a few subs as a test. Not sure what's going on. I'm afraid to put any work into editing subs, for it to all just go away when I click "OK". Please advise. (TL;DR: sometimes the OCR work is copied into the main window, but often not, and I haven't figured out why it behaves one way sometimes, and the other way sometimes.) This is a huge deal breaker for me, as I don't know if all of my work will be lost, or not. I think it might have something to do with click-and-dragging the mkv file (with subs) into the window to load it, vs loading using the "Open" dialog. Thanks for the hard work! Last edited by eddified; 20th April 2020 at 03:59. |
20th April 2020, 07:23 | #879 | Link |
Registered User
Join Date: Feb 2004
Location: Mars
Posts: 428
|
@jlw_4049:
>Once minimized it cannot be re-opened until it's done. Which isn't a negative for me, but just letting you know! If I click on the "Maximize icon" in the lower left corner, it comes back... "Tools" - "Batch convert" can also OCR, but you'll loose the possible bad words etc. @eddified: "Binary image compare" takes a bit longer to get running (and learn). You must find the best "x pixels is space value" and you must add letters and fix wrong letters (double click on a line in the list view to inspect/fix). It does not work well for *all* subtitles, but it's especially nice if you got more than one file with the same font. Tesseract is good too, very easy, but if problems occur they tend to be harder to fix. Also a little slower. 1) It takes about 1 second here for a 25 mb file... 2) This is probably the best forum for OCR stuff 3) That's for Vob ripping from DVD... 4) Red background color in duration cell indicates that the duration if too short or too long - see options - settings - general (min/max/cps) 5) Sorry about that, it's a bug in SE 3.5.14 with drag'n'drop (File - open, works fine) And please test latest beta as SE 3.5.15 should be out soon https://github.com/SubtitleEdit/subt...leEditBeta.zip |
20th April 2020, 13:14 | #880 | Link |
Registered User
Join Date: Jun 2006
Posts: 350
|
Nikse555
Bugs:
Feature requests:
Just in case you missed it: what you think of my suggestion #1 here? It would give us great flexibility for subtitles adjusting. Now we have to perform many manual calculations, e. g. if we want to shift the subtitle start time without changing the subtitle end time.
__________________
Windows 8.1 x64 Magically yours Raistlin |
Thread Tools | Search this Thread |
Display Modes | |
|
|