Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
![]() |
#21 | Link |
Registered User
Join Date: Mar 2010
Posts: 52
|
Thanks Nikse555
![]() I have some problem in timing with some subtitles. when i use OCR...First---I extract subtitles(*.VOBSUB) from video then use it in OCR ..it shows some start time & end time problem like if the original sub have, Stat Time --> End Time 00:00:13,097 --> 00:00:19,185 OCR shows 00:00:13,097 --> 00:00:17,185 but if i use subtitles direct from video it shows correct start time & end time in OCR. |
![]() |
![]() |
![]() |
#22 | Link |
Registered User
Join Date: Jul 2011
Posts: 224
|
I have another feature request, or maybe you know of a configuration file I can edit so that a replacement is always performed.
I would like to replace ’ with ' because ’ shows up very weirdly (last word is supposed to be: didn't ): ![]() |
![]() |
![]() |
![]() |
#23 | Link |
Registered User
Join Date: Feb 2004
Location: Mars
Posts: 428
|
@MajorX: This is hard to say why without the actual sub... The ocr window has a check box weather to use time codes from .idx file or from .sub file.
@xekon: Works here in latest version: http://www.nikse.dk/SubtitleEdit.zip |
![]() |
![]() |
![]() |
#24 | Link |
Registered User
Join Date: Jul 2011
Posts: 224
|
WOW! you weren't kidding about it going faster! just did a couple more episodes and its zooming through the lines much faster!
edit: odd new bug: I'm sorry! was detected as: Code:
I.m s 0 r rY ! ![]() Last edited by xekon; 29th October 2011 at 08:19. |
![]() |
![]() |
![]() |
#25 | Link |
Registered User
Join Date: Jan 2010
Posts: 330
|
Hello there!
I feel like having trouble with OCR. Recognizing from SUP format, tried both methods and both have significant inaccuracies: In the pattern comparison mode, the engine totally ignores differencies between letters 'i' and 'l', and 'c' and 'o' and 'e'. All the letters are assigned the character that was assigned by the first occurence of on of letters from "same" group. For example. 1st subtitle contains word more, the wizard stops at o and I assign it o. When it passes over e, it doesnot ask again for letter even if that s 1st "e" in subtitles and assigns it automatically 'o'. That's very bad. I don't know if that's a result of some auto corrections made by SE, but seems to get wrong assigned even if I turn off all the auto corrections on the right side. That's about character comparison method. Tesseract seems to work better but has considerable flaws too: Some characters are auto uppercased even if they are in lowercase in the source matrix, especially it concerns 's', 'z', 'c' and 'a'. All occurences of these letters seem uppercased regardless on case in the original matrix if they stand as standalone letter or 1st letter in word. All of s, z, c and a's are kept lowercase if in middle a word. PPlease give me some suggestions to make functional at least one of the methods, so that most words are recognized properly and don't need to correct by spell checker. The uppercase problem even doesnot seem repairable by spell checker processing! Thank U ! |
![]() |
![]() |
![]() |
#26 | Link |
Registered User
Join Date: Jul 2011
Posts: 224
|
I have another feature request, could we have a checkbox to omit all <i> </i> tags, they are being used for only half lines when the whole line is italic, they are also being used when there are no italic lines at all.
Right now after I rip a sub I am going through and doing find/replace to delete them all, but it would be great to have that as a feature in Subtitle Edit. very often !! gets detected as ll Is this something that can be fixed? or is there something I can do to help with the detection of exclamation points? or do I have to wait till tesseract is updated? EDIT: on a side note, whatever you did for MS MODI OCR seems to have worked. and it definitely does help! here is an example of the ll instead of " or !! ![]() ![]() Last edited by xekon; 30th October 2011 at 09:15. |
![]() |
![]() |
![]() |
#27 | Link |
Registered User
Join Date: Jul 2011
Posts: 224
|
OMG OMG OMG! The programmer in me has just thought of a VERY COOL feature you could add!
call it a visual tool for super fast comparison. (OCR can only get so good, and if you want to verify perfect subs, this is a good way to do it.) The goal should always be perfect OCR on the first sweep, but visually checking the subs afterwards is just to verify, and the quicker you can do that the better. Let me know what you think of this idea, I am sure it would actually be something that would be pretty fun to program. Please let me know what you think because i think it would be AWESOME! I am drawing an illustration in Photoshop now. EDIT: ok to illustrate my idea... OCR a .SUP file. then use the arrow key to go down line by line, reading the text, and then looking at the image to compare and see that they are the same. Now, that is not exactly quick, the brain has to think more, it has to remember more, and your eyes have to move and focus on more than one area, below is my idea: Basically, use an opengl or directx library that can overlay text, or any library that looks like it will work to overlay text with transparency. And size the text to roughly overlay the SUB image with like a 50-60% transparency. The letters dont have to line up perfectly, anywhere close will allow you to quickly with just a glance tell if the sub and text match visually. (basically you read the sub line ONLY once, and your brain looks for discrepancies as you do it. versus reading two or three times, and moving your eye between locations, and also having to remember and hope you remember correctly.) I think for somebody that visually checks there OCR for their subs, this would probably speed up the process for them 200%+ see how easy it is to see that they match: ![]() here is one that passed the OCR, but is incorrect: ![]() here is another one that passed the OCR, but is incorrect (depending on the library used you could even apply a border/stroke to the outside of the letters) ![]() here is another, there is probably one that passes through the ocr, green light and all, in every episode, you just have to look carefully (you might even be able to adjust the thickness of the characters, so that they usually fall within the bounds of the SUB image character outlines): ![]() Last edited by xekon; 30th October 2011 at 10:51. |
![]() |
![]() |
![]() |
#28 | Link |
Registered User
Join Date: Nov 2001
Posts: 1,104
|
Looks impressive but I think it's overkill. Why not simly have a small window showing the item and an editable text window below that shows the OCRed text. In case they don't match simply alter the text and move on to the next item.
__________________
MultiMakeMKV: MakeMKV batch processing (Win) MultiShrink: DVD Shrink batch processing Offizieller Übersetzer von DVD Shrink deutsch |
![]() |
![]() |
![]() |
#29 | Link | |
Registered User
Join Date: Feb 2004
Location: Mars
Posts: 428
|
Quote:
|
|
![]() |
![]() |
![]() |
#30 | Link |
Registered User
Join Date: Jul 2011
Posts: 224
|
Nikse555 please let me know what you think of my idea, if its not something your interested in, then I will try adding it. I just noticed Subtitle Edit is open source.
Could I please have a copy of the source code that is as current as: http://www.nikse.dk/SubtitleEdit.zip the one on code.google.com is October 14. Last edited by xekon; 31st October 2011 at 20:03. |
![]() |
![]() |
![]() |
#31 | Link |
Registered User
Join Date: Jul 2011
Posts: 224
|
Subtitle Edit has very accurate result for the OCR. There are usually only 1-3 wrong subs out of 300 lines. That is quite impressive. So generally you wont need to do much editing, only verifying. The method I posted is the quickest way that I can think of to scan through entire sub files after the OCR and visually verify.
|
![]() |
![]() |
![]() |
#32 | Link | ||
Registered User
Join Date: Feb 2004
Location: Mars
Posts: 428
|
Hi Anakunda!
Quote:
A work-around is to right-click on the offending line in the list view, and choose "Inspect compare matches for current image" - here you can choose "Add better match" to correct mistakes. (my image compare code is a bit slow for blu-ray images...) Quote:
If yes, could you provide a test file + a few line numbers? |
||
![]() |
![]() |
![]() |
#33 | Link |
Registered User
Join Date: Feb 2004
Location: Mars
Posts: 428
|
Another way to proof read would be to right click in the list view - and choose "Save all images with html index...". This displays a web page with all images + ocr'ed text if available. In latest version, this also shows text with background color.
|
![]() |
![]() |
![]() |
#34 | Link |
Registered User
Join Date: Mar 2010
Posts: 52
|
Hi Nikse555
can u plzz check this *.SUP file...i get only strange symbols with OCR. http://www.mediafire.com/?aoy66c5ue9mbah9 |
![]() |
![]() |
![]() |
#35 | Link |
Registered User
Join Date: Jul 2011
Posts: 224
|
MajorX I tried your file with Nikse555's latest version here: http://www.nikse.dk/SubtitleEdit.zip
I also got lots of symbols if I had "Try MS MODI OCR for unknown words" unchecked. but if you use the MS MODI OCR it detects all of them just fine ![]() give it a shot. PS: I wonder if that subtitle file has ever had its resolution resized.... the letters are really bad quality. Last edited by xekon; 2nd November 2011 at 03:33. |
![]() |
![]() |
![]() |
#38 | Link | |
Registered User
Join Date: Feb 2004
Location: Mars
Posts: 428
|
Quote:
![]() This font don't look blu-ray like but seems clear enough. Resizing did not help, but changing font color to white seems to help, so this is included latest version, which should handle your sup better: http://www.nikse.dk/SubtitleEdit.zip |
|
![]() |
![]() |
![]() |
#39 | Link | |
Registered User
Join Date: Mar 2010
Posts: 52
|
Quote:
![]() ![]() |
|
![]() |
![]() |
![]() |
#40 | Link |
Registered User
Join Date: Feb 2004
Location: Mars
Posts: 428
|
Subtitle Edit 3.2.3 is now finally out with lots of minor improvements and fixes!
Change log New: Added Brazilian Portuguese - thx XXXXXXXXXX New: Added Italian language file - thx Maff New: Added Portuguese (Portugal) language file - thx Ricardo Perdigão New: Added Japanese language file - thx Nardog New: Added Spanish language file - thx m2s New: Support for subtitle format AvidCaption - thx Laszlo New: Support for F4 subtitle formats - thx Fred New: Export to Blu-ray sup format Improved: Updated Tesseract to 3.01. Now includes (some) italic detection + adds support for Arabic, Hebrew, Hindi and Thai Improved: Undo improved so it also works for textbox + redo (Ctrl+Y) Improved: Many new configurable shortcuts (e.g. for fullscreen video player) Improved: OCR tweaked a bit + BluRay sup files are processed faster Improved: TextBox with current subtitle now shows cursor position - thx Leszek Improved: Subtitle format PAC much improved - thx Peter Improved: Subtitle format FCP Xml improved - thx Ulrik Improved: Subtitle format D-Cinema improved - thx Karam Improved: Splitting of lines - Thx Trottel Improved: Auto break lines - thx Majid Improved: Some fixes for Fix common errors/Remove text for HI - thx Majid Improved: Optimized Fix Common Errors Improved: DirectShow can now also play audio-only files Fixed: Crash when setting Options - thx karmazyn Fixed: Crash in set color (or set font) - thx LEO33 Fixed: Crash/freeze when loading large subtitle files - thx Ulrik Fixed: Bug when clicking in list view while running ocr - thx sialivi Fixed: De-selecting text in textbox via single click - thx XhmikosR Fixed: Possible crash in spell check + German dictionary should work Fixed: Missing save/load of a fix common errors setting - thx menes Fixed: Removed Microsoft translate as it's useless with new quotas Fixed: Milliseconds in timed text - thx Calle Fixed: Names with spaces now works in spell check - thx Dr. jackson Fixed: Do not use frame rate if it's zero (audio files) - thx dixie.fever Fixed: Possible crash when saving xml files - thx Peter http://code.google.com/p/subtitleedit/downloads/list Last edited by Nikse555; 13th January 2012 at 15:18. Reason: forgot link |
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
Display Modes | |
|
|