Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Subtitles

Reply
 
Thread Tools Search this Thread Display Modes
Old 4th May 2020, 20:46   #1  |  Link
Perenista
Registered User
 
Join Date: Oct 2013
Posts: 81
How can I strip the timecodes from SRT subtitles?

I have a video with two SRT subtitles.

The first one is synchronized perfectly. The 2nd, it's not, but it would be hard or it would take too much time for me to fix this. However, I need both, and they are different (one is english, the other portuguese, and I want these 2 languages).

Let's call the perfect subtitle A (english) and the wrong one B (portuguese).

So I had the following idea:

1) Save as plain text subtitle B. So all timecodes were stripped from the file. Meaning that instead of:

************

1
00:00:10,969 --> 00:00:12,471
Look.

2
00:00:14,806 --> 00:00:17,600
And you said they were incompatible.

************

We would have:

************
Look.
And you said they were incompatible.
************

For lines 1 and 2 of said TXT.

2) Since subtitle A has the correct timecodes, I was thinking of doing this:

- Remove all text from subtitle A, and preserve only the timecodes.
- Insert all text from subtitle B into A. Then save as a new file.

Is there a way to do this?

Doing manually would take a looooooooooong time, since there are 883 lines in the plain text file.

Last edited by Perenista; 4th May 2020 at 20:50.
Perenista is offline   Reply With Quote
Old 4th May 2020, 23:17   #2  |  Link
StainlessS
HeartlessS Usurer
 
StainlessS's Avatar
 
Join Date: Dec 2009
Location: Over the rainbow
Posts: 8,519
For anyone that might want to consider this, provide more details might be a good idea.
Both subs all correspond exactly, have exact same number of subtitles,
and each subtitle number has same meaning for each language.
__________________
I sometimes post sober.
StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace

"Some infinities are bigger than other infinities", but how many of them are infinitely bigger ???

Last edited by StainlessS; 5th May 2020 at 02:09.
StainlessS is online now   Reply With Quote
Old 5th May 2020, 00:39   #3  |  Link
Emulgator
Big Bit Savings Now !
 
Emulgator's Avatar
 
Join Date: Feb 2007
Location: close to the wall
Posts: 887
Swap every 5th line ?
Maybe there is a script that can do this in notepad++.
Well, 2-and 3-liners are not covered by that approach...
__________________
"To bypass shortcuts and find suffering...is called QUALity" (Die toten Augen von Friedrichshain)
"Data reduction ? Yep, Sir. We're working on that issue. Synce invntoin uf lingöage..."
Emulgator is offline   Reply With Quote
Old 5th May 2020, 02:05   #4  |  Link
StainlessS
HeartlessS Usurer
 
StainlessS's Avatar
 
Join Date: Dec 2009
Location: Over the rainbow
Posts: 8,519
OK, I've got an avisynth script that parses a srt file OK, all the way to the end. (PAL Alien Eng) [So that is a start at least, not capable of doing what you want just yet]

However, something that only just occurred to me [when I encountered a problem] was that the srt I used was encoded utf8 with BOM,
I needed to convert to utf8 without BOM, otherwise it failed on parsing the BOM (first 4 hidden characters).

PsPad text editor - Menu/Encoding/Unicode UTF8 no BOM (65001)

I know zip about character encoding but once converted the script works [might have lost some wierd accent or special characters].

Are your srt files some wierd encoding ? [Portuguese so I guess so]
__________________
I sometimes post sober.
StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace

"Some infinities are bigger than other infinities", but how many of them are infinitely bigger ???

Last edited by StainlessS; 5th May 2020 at 02:50.
StainlessS is online now   Reply With Quote
Old 5th May 2020, 03:35   #5  |  Link
StainlessS
HeartlessS Usurer
 
StainlessS's Avatar
 
Join Date: Dec 2009
Location: Over the rainbow
Posts: 8,519
OK, try this, needs Avisynth+, and RT_Stats v2.0Beta12 [See Mediafire link below this post in my sig]

1_ParseAndWriteDBase.avs [Run for each srt file, change, set PROC_SRT = 1 for first file and run, then set PROC_SRT = 2 and run, then run script 2]
Code:
# 1_ParseAndWriteDBase.avs

/*
    Requires Avs+, RT_Stats v2.0 Beta 12

    Will Require non weird character encoding, if necessary try UNICODE UTF8 without BOM (65001) [I did my test convert via PsPad text editor]
*/

##################
### CONFIG  ######
PROC_SRT = 1                  # 1 Process FN1, 2 Process FN2 (and write DB if WRITE_DBASE=true)
WRITE_DBASE= True             # If False then does Not write DBase : Need create a DBase for EACH srt File. [FALSE for testing script parsing only]
##
FN1 =".\Alien_1.srt"          # MUST Create DBASE for Each SRT file.
FN2 =".\Alien_2.srt"          #
##################
##################


FN = (PROC_SRT==2) ? FN2 : FN1
FN=RT_GetFullPathName(FN)
DB=FN+".DB"
TypeStr="s512s512"                                      # DBase, 2 fields both String[512]
(WRITE_DBASE) ? RT_FileDelete(DB) : NOP                 # Delete any existing DBase if writing new one
(WRITE_DBASE) ? RT_DBaseAlloc(DB,0,TypeStr) : NOP       # Create Empty DBase (0 records)
##################
LINES=RT_FileQueryLines(FN)
IN = False
SubN=1         # Subtitle Number
TimeS=""       # Times String
Subtitles=""   # Subtitle String
SubLines=0     # Lines Gotten for Subtitle
SubStartLine=0 # Line number where subtitle start [ie the subtitle number line, relative 1]
SubIx=0        #
DIGITALP="0123456789"
TIMEALP=DIGITALP+":,"

For(i=0,LINES-1) {
    Txt=RT_ReadTxtFromFile(FN,Lines=1,Start=i).RT_TxtGetLine.ChrEatWhite.RevStr.ChrEatWhite.RevStr   # Get Line of text, remove EOL and Eat leading & trailing White Space
    RT_DebugF("%d] IN=%s %s",i,IN,Txt)
    len=Txt.StrLen
    if(!IN) {
        if(len>0) {
            numlen=Txt.StrMatchChrLen(DIGITALP,sig=True)
            Assert(numlen>0,RT_String("Line %d Subtitle Number %d NOT FOUND\n'%s'",i+1,SubN,Txt))
            Number=txt.RT_NumberValue
            Assert(Number==SubN,RT_String("Line %d Expecting subtitle Number %d Got %d\n'%s'",i+1,SubN,Number,Txt))
            s=Txt.MidStr(numlen+1)
            Assert(s=="",RT_String("Line %d Expecting nothing after subtitle number %d, Got '%s'\n'%s'",i+1,SubN,s,Txt))
            TimeS=""
            Subtitles=""
            SubLines=0
            SubStartLine = i+1
            SubIx=1     # Expecting Times next
            IN = True
        }
    } else {
        if(len>0) {
            if(SubIx==1) { # Times
                OpenTimeLen=Txt.StrMatchChrLen(TIMEALP,sig=True)
                Assert(Len==29,RT_String("Line %d Times expecting 29 characters, got %d \n'%s'",i+1,len,Txt))
                Assert(OpenTimeLen==12,RT_String("Line %d OpenTime expecting 12 characters, got %d \n'%s'",i+1,OpenTimeLen,Txt))
                s=Txt.MidStr(13)
                Assert(s.LeftStr(5)==" --> ",RT_String("Line %d expecting ' --> '\n'%s'",i+1,Txt))
                s=s.MidStr(6)
                CloseTimeLen=s.StrMatchChrLen(TIMEALP,sig=True)
                Assert(CloseTimeLen==12,RT_String("Line %d CloseTime expecting 12 characters, got %d \n'%s'",i+1,CloseTimeLen,txt))
                Times=Txt
                SubIx=2
            } else if(SubIx==2) { # Text
                Subtitles=(SubLines==0)?Txt:Subtitles+Chr(10)+Txt
                SubLines=SubLines+1
            }
        }
        if(len==0 || i+1>=LINES) {
            Assert(SubIx==2,RT_String("Line %d Expecting %s\n'%s'",i+1,SubIx==0?"Subtitle Number":"Times",Txt))
            RT_DebugF("###########\n%d] %d\n    %s\n   %s\n###########",SubStartLine,SubN,Times,Subtitles)
            If(WRITE_DBASE) {
                RT_DBaseAppend(DB,TimeS,Subtitles)
            }
            SubN=SubN+1
            IN = False
        }
    }
}

Return (!WRITE_DBASE)
    \ ? MessageClip(RT_String("Parse Only\n'%s'",FN))
    \ : MessageClip(RT_String("Parse And Write DBAse\n'%s'\n'%s'",FN,DB))

##############################

Function ChrEatWhite(String S)   {i=1 C=RT_Ord(S,i) While(C==32||C>=8&&C<=13)                        {i=i+1 C=RT_Ord(S,i)} return i>1?MidStr(S,i):S}

# Return extent of string S [ie length from the beginning] that matches any character in Chars set of characters [Default case insignificant]. # StrMatchChrLen("1234.567abcd","0123456789.") = 8
Function StrMatchChrLen(String s,String Chars,Bool "Sig") {
    Function __StrMatchChrLen_LOW(String s,String Chars,int n) { c=s.MidStr(n+1,1) Return(c==""||Chars.FindStr(c)==0) ? n : s.__StrMatchChrLen_LOW(Chars,n+1) }
    Sig=Default(Sig,False)  # Default Case Insignificant
    s=(Sig)?s:s.UCASE   Chars=(Sig)?Chars:Chars.UCASE
    __StrMatchChrLen_LOW(s,Chars,0)
}
2_WriteFixSrt.avs [Write output SRT file].
Code:
# 2_WriteFixSrt.avs

/*
    Requires Avs+, RT_Stats v2.0 Beta 12

    NUMBER OF SUBTITLES MUST MATCH ELSE ABORTS

*/

##################
FN1 =".\Alien_1.srt"           # Same As in 1_ParseAndWriteDBase.avs
FN2 =".\Alien_2.srt"           # Same As in 1_ParseAndWriteDBase.avs
SRT =".\Alien_Out.srt"         # Output Srt file
TIMES_FROM = 1                 # Which DBase to Get Times From
SUBS_FROM  = 2                 # Which DBase to Get Subtitles From
##################

FN1=RT_GetFullPathName(FN1)
FN2=RT_GetFullPathName(FN2)
SRT=RT_GetFullPathName(SRT)
DB1=FN1+".DB"
DB2=FN2+".DB"

###

Records  = RT_DBaseRecords(DB1)
Records2 = RT_DBaseRecords(DB2)
Assert(Records == Records2,RT_String("DBase Subtitle Count MisMatch DB1=%d DB2=%d",Records,Records2))
Assert(1 <= TIMES_FROM <= 2,"1 <= TIMES_FROM <= 2")
Assert(1 <= SUBS_FROM  <= 2,"1 <= SUBS_FROM  <= 2")
TDB = (TIMES_FROM==1) ? DB1 : DB2
SDB = (SUBS_FROM ==1) ? DB1 : DB2
RT_FileDelete(SRT)            # Prep for write
###

for(i=0,Records-1) {
    TimeS = RT_DBaseGetField(TDB,i,0)
    SubS  = RT_DBaseGetField(SDB,i,1)
    RT_WriteFile(SRT,"%d\n%s\n%s\n\n",i+1,TimeS,SubS,Append=True)
}

MessageClip("All Done")
I tried with exact same srt file for each DBase, and wrote output srt file.
Compared output srt and one of the inputs with KDiff and said "Binary Equal", ie exactly same source srt [both input srt were exactly same].
Change the last subtitle time millisecs to "000" in source 1 srt, and a single word in last subtitle text in source 2 file,
and repeated both scripts.
Diff KDiff compare with output and each individual input and KDiff flagged only the deliberate changes in each, so seems pretty spot on if
your subs files are 1:1 exact corresponding.
Will throw error if differing number of subs in each input file.
__________________
I sometimes post sober.
StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace

"Some infinities are bigger than other infinities", but how many of them are infinitely bigger ???

Last edited by StainlessS; 5th May 2020 at 03:51.
StainlessS is online now   Reply With Quote
Old 5th May 2020, 04:29   #6  |  Link
HolyWu
Registered User
 
HolyWu's Avatar
 
Join Date: Aug 2006
Location: Taiwan
Posts: 752
Don't do thing in complex way when you can do it in easy way.

1. Use Aegisub and open the subtitle with correct timecodes.
2. Select all subtitle lines by pressing Ctrl-A.
3. Copy lines by pressing Ctrl-C.
4. Open the other subtitle with wrong timecode.
5. Select all subtitle lines by pressing Ctrl-A.
6. Paste lines over by pressing Ctrl-Shift-V.
7. Click the Times button to only select Start Time and End Time, and then press OK.

You have to make sure that both subtitles have the same number of lines, otherwise you'll have to use Shift Times by pressing Ctrl-I to shift some of the lines.
HolyWu is offline   Reply With Quote
Old 5th May 2020, 05:31   #7  |  Link
StainlessS
HeartlessS Usurer
 
StainlessS's Avatar
 
Join Date: Dec 2009
Location: Over the rainbow
Posts: 8,519
Thank you HolyWu, I have never used Aegissub [well only ever used SubRip and occasional SubEdit] and had
no idea that you could do that. Cheers.
__________________
I sometimes post sober.
StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace

"Some infinities are bigger than other infinities", but how many of them are infinitely bigger ???
StainlessS is online now   Reply With Quote
Old 5th May 2020, 06:15   #8  |  Link
Emulgator
Big Bit Savings Now !
 
Emulgator's Avatar
 
Join Date: Feb 2007
Location: close to the wall
Posts: 887
Thank you both !
__________________
"To bypass shortcuts and find suffering...is called QUALity" (Die toten Augen von Friedrichshain)
"Data reduction ? Yep, Sir. We're working on that issue. Synce invntoin uf lingöage..."
Emulgator is offline   Reply With Quote
Old 5th May 2020, 07:56   #9  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 328
In Subtitle Edit you can:

1) Load sub with bad time codes
2) File -> Import time codes... choose sub with good time codes


SE can (as Aegisub) also do column paste.

Alternately you can sync via "Visual sync" or "Point sync via other subtitle".
Nikse555 is offline   Reply With Quote
Old 5th May 2020, 16:27   #10  |  Link
StainlessS
HeartlessS Usurer
 
StainlessS's Avatar
 
Join Date: Dec 2009
Location: Over the rainbow
Posts: 8,519
Thanks for that too Nikse, we learn something new every day.
If I had OP problem, I probably would have used Visual Sync [I sometimes do],
I only wrote that script because OP seemed to want exact same timecodes, and I
thought it would be an interesting way to pass some time [and have a subs SRT parsing
script that could be easily modified for other purposes].

I did leave it for a couple of hours before I did script [started it after Emulgator posted]
and as I had little else to do whilst waiting for an avs SysInfo plugin update, I thought I'de give it a go.
I probably should have just pointed out Visual Sync in SubEdit. [I did not know about "Import time codes"]
__________________
I sometimes post sober.
StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace

"Some infinities are bigger than other infinities", but how many of them are infinitely bigger ???
StainlessS is online now   Reply With Quote
Old 5th May 2020, 18:55   #11  |  Link
sneaker_ger
Registered User
 
Join Date: Dec 2002
Posts: 5,563
There are now also a few solutions that try to cleverly sync to existing subtitles or even audio (with speech recognition) without relying on both files matching each other perfectly.

https://subsync.online/
https://github.com/kaegi/alass
https://github.com/saurabhshri/CCAligner

Don't know how good they are, though.
sneaker_ger is offline   Reply With Quote
Old 5th May 2020, 19:12   #12  |  Link
StainlessS
HeartlessS Usurer
 
StainlessS's Avatar
 
Join Date: Dec 2009
Location: Over the rainbow
Posts: 8,519
And thanks also SG, got a real awkward one where those first two might come in handy, was also probably the real reason
that I did the SRT parsing thing, was gonna try similar in script.
__________________
I sometimes post sober.
StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace

"Some infinities are bigger than other infinities", but how many of them are infinitely bigger ???
StainlessS is online now   Reply With Quote
Old 15th October 2020, 19:56   #13  |  Link
marsoupilami
Registered User
 
marsoupilami's Avatar
 
Join Date: Sep 2003
Location: Austria
Posts: 103
@Perenista
I don't know if it is of interest any more...

For syncing 2 different subtitle files I use my crazy tool SubSplicer - although I've written it for a different purpose, it does exactly what you want:
.) You can load your "in sync" english subs with "load upper" into the left-side panel and the "out of sync" portuguese subs with "load lower" into the right one.
.) Select/click the 1st english sub and then the corresponding portuguese one. Click "add link"
.) Do the same for the last english and the last (corresponding) portuguese one, "add link" - now you have two syncing points defined
.) Press the right (portuguese) "Synchronize" button - this will adjust the first and the subtitle exactly, all subs between will be squeezed or stretched like a "rubberband"
.) Now you can save the synchronized portuguese subs by pressing the right "Save As". (Codepage could be changed if required by changing codepage selector near the "Load" button)

This "rubberbanding" doesn't care about exact subtitle count, line numbers or whatever

Happy syncing
__________________
Houmba!

IsoPuzzle has a new home
SubSplicer is born!

http://members.aon.at/marsoupilami/
marsoupilami is offline   Reply With Quote
Old 19th November 2020, 20:36   #14  |  Link
dev-null
Registered User
 
Join Date: Jan 2020
Posts: 2
If you already have one subtitle in proper sync and just want to time a second subtitle according to it (that may or may not have the exact number of lines) then you can use "alass" (https://github.com/kaegi/alass) for it:

$ alass reference_subtitle_with_proper_sync.srt subtitle_that_needs_to_be_fixed.srt output.srt
dev-null is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 18:41.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, vBulletin Solutions Inc.