Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Subtitles

Reply
 
Thread Tools Search this Thread Display Modes
Old 3rd October 2010, 22:33   #1  |  Link
ruediger.s
Registered User
 
Join Date: Sep 2010
Posts: 1
Announcing VobSub2SRT: convert .sub/.idx to .srt on Linux

VobSub2SRT is a Linux command line tool to convert Vobsub (.idx/.sub) subtitles into the .srt subtitle format. It is based on mplayer's vobsub code and uses tesseract for the OCR part.

You can get the source and manual at http://github.com/ruediger/VobSub2SRT

The quality of the OCR depends heavily on the quality of the subtitles. I'm currently planning to add some preprocessing features (like rescaling) to increase the OCR probabilities.

I'm developing VobSub2SRT on Kubuntu (current 10.04) but it should work on other Linux systems as well (and maybe even Mac OS X).

To build vobsub2srt on Ubuntu use
Code:
sudo apt-get install libavutil-dev tesseract-ocr-dev tesseract-ocr-eng build-essential cmake
./configure
make
sudo make install
To convert subtitles call
Code:
vobsub2srt Filename
with Filename being the file name of the subtitles without the .idx/.sub extension. The result is written to Filename.srt. To get more information use --verbose as parameter. With --lang langcode you can select the language stream (make sure you got the tesseract data for that language installed).

I hope this tool is useful to you and please give me some feedback.
ruediger.s is offline   Reply With Quote
Old 22nd December 2010, 10:14   #2  |  Link
Selur
Registered User
 
Selur's Avatar
 
Join Date: Oct 2001
Location: Germany
Posts: 7,259
Nice! Thanks! Really like the idea of having a command line vobsub to srt converter. (if someone finds the motivation&time to make a windows port of this, I'll integrate it into Hybrid)

Cu Selur
__________________
Hybrid here in the forum, homepage
Selur is offline   Reply With Quote
Old 26th January 2011, 18:13   #3  |  Link
bjrnfrdnnd
Registered User
 
Join Date: Nov 2003
Posts: 3
compiling on mac osx

Hi,

I am running mac osx 10.6.6 on a macbook pro early 2008.
I have installed macports and have installed tesseract from the repositories.
The version of tesseract seems to be 3.0:
Code:
 
cn-b204-2:ruediger-VobSub2SRT-e46e81a bn$ tesseract -v
tesseract 3.00
The version of your code seems to be e46e81a, if I believe the name of the download.

When configuring , I get the following error:
Code:
cn-b204-2:ruediger-VobSub2SRT-e46e81a bn$ ./configure 
-- The C compiler identification is GNU
-- The CXX compiler identification is GNU
-- Checking whether C compiler has -isysroot
-- Checking whether C compiler has -isysroot - yes
-- Checking whether C compiler supports OSX deployment target flag
-- Checking whether C compiler supports OSX deployment target flag - yes
-- Check for working C compiler: /opt/local/bin/gcc
-- Check for working C compiler: /opt/local/bin/gcc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Checking whether CXX compiler has -isysroot
-- Checking whether CXX compiler has -isysroot - yes
-- Checking whether CXX compiler supports OSX deployment target flag
-- Checking whether CXX compiler supports OSX deployment target flag - yes
-- Check for working CXX compiler: /opt/local/bin/c++
-- Check for working CXX compiler: /opt/local/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Source: /Users/bn/Movies/ruediger-VobSub2SRT-e46e81a
-- Binary: /Users/bn/Movies/ruediger-VobSub2SRT-e46e81a/build
-- Build type: Debug
-- checking for module 'libavutil'
--   found libavutil, version 50.15.1
-- Found Tesseract: Tesseract_LIBRARIES-NOTFOUND;/opt/local/lib/libtiff.dylib 
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
Tesseract_LIBRARIES (ADVANCED)
    linked by target "vobsub2srt" in directory /Users/bn/Movies/ruediger-VobSub2SRT-e46e81a/src

-- Configuring incomplete, errors occurred!
I do not know what the tesseract libraries are. The tesseract installation did however come with trainingsdata for english. On my machine, this data is located at
/opt/local/share/tessdata
in a file named eng.traineddata.

Are there any ways to get this compiled on my macbook?
bjrnfrdnnd is offline   Reply With Quote
Old 26th January 2011, 22:36   #4  |  Link
bjrnfrdnnd
Registered User
 
Join Date: Nov 2003
Posts: 3
problems with german language?

Hi,
I am also trying to use your program in order to convert german subtitles.
This is on a linux amd64 machine running ubuntu 10.10,
the version of tesseract is 2.04-2.
German language files are installed and are found under
/usr/share/tesseract-ocr/tessdata

These files start with deu.

When running vobsub2srt with --lang de, I get the following
Code:
vobsub2srt  --lang de --verbose  output

VobSub: Can't open IFO file
vobsub: ignoring size: 720x576
vobsub: ignoring palette: bbe20c, 0ba7cc, 101010, eaeaea, 438143, ec14ed, ebff0b, 0d617a, 7b7b7b, d1d1d1, 7b2a0e, 0d790d, 0ce60b, eaeaea, bc5a38, bbd838
vobsub: ignoring forced subs: OFF
[vobsub] subtitle (vobsubid): 0 language de
Index Count: 1
Id: 0 Lang: <no id>
Selected VOBSUB language: 0 language: de
Unable to load unicharset file /usr/share/tesseract-ocr/tessdata/ger.unicharset
For some reason, the program is looking for files starting with ger.
What could I do in order to make it work? I already tried to install tesseract 3.00, but this seems to be incompatible with vobsub2srt.
bjrnfrdnnd is offline   Reply With Quote
Reply

Tags
linux, ocr, srt, vobsub

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 13:59.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.