Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Subtitles

Reply
 
Thread Tools Search this Thread Display Modes
Old 20th November 2010, 21:52   #1  |  Link
lovelove
Registered User
 
Join Date: Mar 2010
Posts: 106
subtitle conversion from DVD to text without OCR?

Hi. I couldn't find a lot of info on the net on subtitles. But from what I understand, DVD subtitles don't exist as text streams but as bitmap (pixel) images ONLY, correct? (... although MediaInfo displays my .vob files as having text streams)

I found a lot of threads here on doom9 pointing to programs which demux the subtitles from the .vob files. Depending on the program, this results in .sup files or in a .sub/.idx combination. But none of them are human readable, isn't it? My guess is that the files still contain a stream with bitmap graphics.

My question is: Is there any program which would decode the data in those files directly to text without the intermediate OCR step? A .sup to .srt converter? Or a .sub to .srt converter? Or something like that?


Last edited by lovelove; 20th November 2010 at 21:57.
lovelove is offline   Reply With Quote
Old 20th November 2010, 23:36   #2  |  Link
Inspector.Gadget
Registered User
 
Join Date: May 2008
Posts: 1,618
Quote:
although MediaInfo displays my .vob files as having text streams
Some DVDs have closed captions, which are actually text streams muxed into the video stream. They are not the same as bitmap subtitles, which MUST be OCR'd to translate them to text subs, because there is no metadata in the bitmap sub stream that will tell any "dumb" converter what each letter should be. You can hand off the OCR to automatic methods with varying degrees of success using Subrip (very accurate but requires some intervention) or DVDSubedit (reasonably accurate, requires little user input).
Inspector.Gadget is offline   Reply With Quote
Old 21st November 2010, 00:54   #3  |  Link
lovelove
Registered User
 
Join Date: Mar 2010
Posts: 106
Quote:
Originally Posted by Inspector.Gadget View Post
bitmap subtitles, which MUST be OCR'd to translate them to text subs, because there is no metadata in the bitmap sub stream that will tell any "dumb" converter what each letter should be.
OK, I am not sure if I will manage to articulate my reply in an intelligible way, but I will try. Take a JPEG encoded image. You can open it in an image editor, put the image on screen an then rotate it 90 degrees to the right. And then save it again as a .jpeg file.
On the other hand, there are programs which manipulate the JPEG bitstream directly and rotate the image without ever bringing it on the screen (not even "hidden", because the jpeg bitstream is never decoded to x,y image pixels but manipulated directly, without decoding).

When translating this situation to subtitles, I was hoping that *somehow* the step of bringing the subtitles on the screen (hidden or unhidden) for OCR could be avoided. Now I admit that even after pondering this for quite a bit, I fail to see how this could possibly be done. But my hope was that the experts here, who have seen a lot in their life, would maybe know of *any* other way than OCR...

Is the analogy more or less understandable?

Last edited by lovelove; 21st November 2010 at 01:10.
lovelove is offline   Reply With Quote
Old 21st November 2010, 01:07   #4  |  Link
lovelove
Registered User
 
Join Date: Mar 2010
Posts: 106
Quote:
Originally Posted by Inspector.Gadget View Post
Some DVDs have closed captions, which are actually text streams muxed into the video stream. They are not the same as bitmap subtitles
And how can I convert the closed captions on DVD to a text file?


The MediaInfo output of one of my .vob files (the second 1 GB vob file of six) looks as follows:

Code:
Text #1
ID :	224 (0xE0)-DVD-1
Format :	EIA-608
Muxing mode :	MPEG Video / DVD-Video
Muxing mode, more info :	Muxed in Video #1
Stream size :	0.00 Byte (0%)

Text #2
ID :	32 (0x20)
Format :	RLE
Format/Info :	Run-length encoding

Text #3
ID :	33 (0x21)
Format :	RLE
Format/Info :	Run-length encoding

Text #4
ID :	34 (0x22)
Format :	RLE
Format/Info :	Run-length encoding

When playing this .vob file in VLC, I have this in my subtitle menu:
Quote:
closed captions 1
closed captions 2
closed captions 3
closed captions 4
When playing the VIDEO_TS folder, VLC offers me a lot more subtitles:
Code:
Track 1 - [Russian]
Track 2 - [English]
Track 3 - [Espagnol]
Track 4 - [Esperanto]
closed captions 1
closed captions 2
closed captions 3
closed captions 4
Why the difference between MediaInfo and VLC?
hm...this seems so complicated and I just don't know where to start to better understand this ...

Last edited by lovelove; 21st November 2010 at 01:13.
lovelove is offline   Reply With Quote
Old 21st November 2010, 01:41   #5  |  Link
Inspector.Gadget
Registered User
 
Join Date: May 2008
Posts: 1,618
Quote:
And how can I convert the closed captions on DVD to a text file?
CCExtractor.
Inspector.Gadget is offline   Reply With Quote
Old 21st November 2010, 01:58   #6  |  Link
lovelove
Registered User
 
Join Date: Mar 2010
Posts: 106
Ok, thanks. I'll try.
Any idea about the results posted in #4 ?
lovelove is offline   Reply With Quote
Reply

Tags
converstion, idx, ocr, subtitle

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 10:36.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.