OK, try this, needs Avisynth+, and RT_Stats v2.0Beta12 [See Mediafire link below this post in my sig]
1_ParseAndWriteDBase.avs [Run for each srt file, change, set
PROC_SRT = 1 for first file and run, then set
PROC_SRT = 2 and run, then run script 2]
Code:
# 1_ParseAndWriteDBase.avs
/*
Requires Avs+, RT_Stats v2.0 Beta 12
Will Require non weird character encoding, if necessary try UNICODE UTF8 without BOM (65001) [I did my test convert via PsPad text editor]
*/
##################
### CONFIG ######
PROC_SRT = 1 # 1 Process FN1, 2 Process FN2 (and write DB if WRITE_DBASE=true)
WRITE_DBASE= True # If False then does Not write DBase : Need create a DBase for EACH srt File. [FALSE for testing script parsing only]
##
FN1 =".\Alien_1.srt" # MUST Create DBASE for Each SRT file.
FN2 =".\Alien_2.srt" #
##################
##################
FN = (PROC_SRT==2) ? FN2 : FN1
FN=RT_GetFullPathName(FN)
DB=FN+".DB"
TypeStr="s512s512" # DBase, 2 fields both String[512]
(WRITE_DBASE) ? RT_FileDelete(DB) : NOP # Delete any existing DBase if writing new one
(WRITE_DBASE) ? RT_DBaseAlloc(DB,0,TypeStr) : NOP # Create Empty DBase (0 records)
##################
LINES=RT_FileQueryLines(FN)
IN = False
SubN=1 # Subtitle Number
TimeS="" # Times String
Subtitles="" # Subtitle String
SubLines=0 # Lines Gotten for Subtitle
SubStartLine=0 # Line number where subtitle start [ie the subtitle number line, relative 1]
SubIx=0 #
DIGITALP="0123456789"
TIMEALP=DIGITALP+":,"
For(i=0,LINES-1) {
Txt=RT_ReadTxtFromFile(FN,Lines=1,Start=i).RT_TxtGetLine.ChrEatWhite.RevStr.ChrEatWhite.RevStr # Get Line of text, remove EOL and Eat leading & trailing White Space
RT_DebugF("%d] IN=%s %s",i,IN,Txt)
len=Txt.StrLen
if(!IN) {
if(len>0) {
numlen=Txt.StrMatchChrLen(DIGITALP,sig=True)
Assert(numlen>0,RT_String("Line %d Subtitle Number %d NOT FOUND\n'%s'",i+1,SubN,Txt))
Number=txt.RT_NumberValue
Assert(Number==SubN,RT_String("Line %d Expecting subtitle Number %d Got %d\n'%s'",i+1,SubN,Number,Txt))
s=Txt.MidStr(numlen+1)
Assert(s=="",RT_String("Line %d Expecting nothing after subtitle number %d, Got '%s'\n'%s'",i+1,SubN,s,Txt))
TimeS=""
Subtitles=""
SubLines=0
SubStartLine = i+1
SubIx=1 # Expecting Times next
IN = True
}
} else {
if(len>0) {
if(SubIx==1) { # Times
OpenTimeLen=Txt.StrMatchChrLen(TIMEALP,sig=True)
Assert(Len==29,RT_String("Line %d Times expecting 29 characters, got %d \n'%s'",i+1,len,Txt))
Assert(OpenTimeLen==12,RT_String("Line %d OpenTime expecting 12 characters, got %d \n'%s'",i+1,OpenTimeLen,Txt))
s=Txt.MidStr(13)
Assert(s.LeftStr(5)==" --> ",RT_String("Line %d expecting ' --> '\n'%s'",i+1,Txt))
s=s.MidStr(6)
CloseTimeLen=s.StrMatchChrLen(TIMEALP,sig=True)
Assert(CloseTimeLen==12,RT_String("Line %d CloseTime expecting 12 characters, got %d \n'%s'",i+1,CloseTimeLen,txt))
Times=Txt
SubIx=2
} else if(SubIx==2) { # Text
Subtitles=(SubLines==0)?Txt:Subtitles+Chr(10)+Txt
SubLines=SubLines+1
}
}
if(len==0 || i+1>=LINES) {
Assert(SubIx==2,RT_String("Line %d Expecting %s\n'%s'",i+1,SubIx==0?"Subtitle Number":"Times",Txt))
RT_DebugF("###########\n%d] %d\n %s\n %s\n###########",SubStartLine,SubN,Times,Subtitles)
If(WRITE_DBASE) {
RT_DBaseAppend(DB,TimeS,Subtitles)
}
SubN=SubN+1
IN = False
}
}
}
Return (!WRITE_DBASE)
\ ? MessageClip(RT_String("Parse Only\n'%s'",FN))
\ : MessageClip(RT_String("Parse And Write DBAse\n'%s'\n'%s'",FN,DB))
##############################
Function ChrEatWhite(String S) {i=1 C=RT_Ord(S,i) While(C==32||C>=8&&C<=13) {i=i+1 C=RT_Ord(S,i)} return i>1?MidStr(S,i):S}
# Return extent of string S [ie length from the beginning] that matches any character in Chars set of characters [Default case insignificant]. # StrMatchChrLen("1234.567abcd","0123456789.") = 8
Function StrMatchChrLen(String s,String Chars,Bool "Sig") {
Function __StrMatchChrLen_LOW(String s,String Chars,int n) { c=s.MidStr(n+1,1) Return(c==""||Chars.FindStr(c)==0) ? n : s.__StrMatchChrLen_LOW(Chars,n+1) }
Sig=Default(Sig,False) # Default Case Insignificant
s=(Sig)?s:s.UCASE Chars=(Sig)?Chars:Chars.UCASE
__StrMatchChrLen_LOW(s,Chars,0)
}
2_WriteFixSrt.avs [Write output SRT file].
Code:
# 2_WriteFixSrt.avs
/*
Requires Avs+, RT_Stats v2.0 Beta 12
NUMBER OF SUBTITLES MUST MATCH ELSE ABORTS
*/
##################
FN1 =".\Alien_1.srt" # Same As in 1_ParseAndWriteDBase.avs
FN2 =".\Alien_2.srt" # Same As in 1_ParseAndWriteDBase.avs
SRT =".\Alien_Out.srt" # Output Srt file
TIMES_FROM = 1 # Which DBase to Get Times From
SUBS_FROM = 2 # Which DBase to Get Subtitles From
##################
FN1=RT_GetFullPathName(FN1)
FN2=RT_GetFullPathName(FN2)
SRT=RT_GetFullPathName(SRT)
DB1=FN1+".DB"
DB2=FN2+".DB"
###
Records = RT_DBaseRecords(DB1)
Records2 = RT_DBaseRecords(DB2)
Assert(Records == Records2,RT_String("DBase Subtitle Count MisMatch DB1=%d DB2=%d",Records,Records2))
Assert(1 <= TIMES_FROM <= 2,"1 <= TIMES_FROM <= 2")
Assert(1 <= SUBS_FROM <= 2,"1 <= SUBS_FROM <= 2")
TDB = (TIMES_FROM==1) ? DB1 : DB2
SDB = (SUBS_FROM ==1) ? DB1 : DB2
RT_FileDelete(SRT) # Prep for write
###
for(i=0,Records-1) {
TimeS = RT_DBaseGetField(TDB,i,0)
SubS = RT_DBaseGetField(SDB,i,1)
RT_WriteFile(SRT,"%d\n%s\n%s\n\n",i+1,TimeS,SubS,Append=True)
}
MessageClip("All Done")
I tried with exact same srt file for each DBase, and wrote output srt file.
Compared output srt and one of the inputs with KDiff and said "Binary Equal", ie exactly same source srt [both input srt were exactly same].
Change the last subtitle time millisecs to "000" in source 1 srt, and a single word in last subtitle text in source 2 file,
and repeated both scripts.
Diff KDiff compare with output and each individual input and KDiff flagged only the deliberate changes in each, so seems pretty spot on if
your subs files are 1:1 exact corresponding.
Will throw error if differing number of subs in each input file.