View Single Post
Old 5th May 2020, 03:35   #5  |  Link
StainlessS
HeartlessS Usurer
 
StainlessS's Avatar
 
Join Date: Dec 2009
Location: Over the rainbow
Posts: 10,980
OK, try this, needs Avisynth+, and RT_Stats v2.0Beta12 [See Mediafire link below this post in my sig]

1_ParseAndWriteDBase.avs [Run for each srt file, change, set PROC_SRT = 1 for first file and run, then set PROC_SRT = 2 and run, then run script 2]
Code:
# 1_ParseAndWriteDBase.avs

/*
    Requires Avs+, RT_Stats v2.0 Beta 12

    Will Require non weird character encoding, if necessary try UNICODE UTF8 without BOM (65001) [I did my test convert via PsPad text editor]
*/

##################
### CONFIG  ######
PROC_SRT = 1                  # 1 Process FN1, 2 Process FN2 (and write DB if WRITE_DBASE=true)
WRITE_DBASE= True             # If False then does Not write DBase : Need create a DBase for EACH srt File. [FALSE for testing script parsing only]
##
FN1 =".\Alien_1.srt"          # MUST Create DBASE for Each SRT file.
FN2 =".\Alien_2.srt"          #
##################
##################


FN = (PROC_SRT==2) ? FN2 : FN1
FN=RT_GetFullPathName(FN)
DB=FN+".DB"
TypeStr="s512s512"                                      # DBase, 2 fields both String[512]
(WRITE_DBASE) ? RT_FileDelete(DB) : NOP                 # Delete any existing DBase if writing new one
(WRITE_DBASE) ? RT_DBaseAlloc(DB,0,TypeStr) : NOP       # Create Empty DBase (0 records)
##################
LINES=RT_FileQueryLines(FN)
IN = False
SubN=1         # Subtitle Number
TimeS=""       # Times String
Subtitles=""   # Subtitle String
SubLines=0     # Lines Gotten for Subtitle
SubStartLine=0 # Line number where subtitle start [ie the subtitle number line, relative 1]
SubIx=0        #
DIGITALP="0123456789"
TIMEALP=DIGITALP+":,"

For(i=0,LINES-1) {
    Txt=RT_ReadTxtFromFile(FN,Lines=1,Start=i).RT_TxtGetLine.ChrEatWhite.RevStr.ChrEatWhite.RevStr   # Get Line of text, remove EOL and Eat leading & trailing White Space
    RT_DebugF("%d] IN=%s %s",i,IN,Txt)
    len=Txt.StrLen
    if(!IN) {
        if(len>0) {
            numlen=Txt.StrMatchChrLen(DIGITALP,sig=True)
            Assert(numlen>0,RT_String("Line %d Subtitle Number %d NOT FOUND\n'%s'",i+1,SubN,Txt))
            Number=txt.RT_NumberValue
            Assert(Number==SubN,RT_String("Line %d Expecting subtitle Number %d Got %d\n'%s'",i+1,SubN,Number,Txt))
            s=Txt.MidStr(numlen+1)
            Assert(s=="",RT_String("Line %d Expecting nothing after subtitle number %d, Got '%s'\n'%s'",i+1,SubN,s,Txt))
            TimeS=""
            Subtitles=""
            SubLines=0
            SubStartLine = i+1
            SubIx=1     # Expecting Times next
            IN = True
        }
    } else {
        if(len>0) {
            if(SubIx==1) { # Times
                OpenTimeLen=Txt.StrMatchChrLen(TIMEALP,sig=True)
                Assert(Len==29,RT_String("Line %d Times expecting 29 characters, got %d \n'%s'",i+1,len,Txt))
                Assert(OpenTimeLen==12,RT_String("Line %d OpenTime expecting 12 characters, got %d \n'%s'",i+1,OpenTimeLen,Txt))
                s=Txt.MidStr(13)
                Assert(s.LeftStr(5)==" --> ",RT_String("Line %d expecting ' --> '\n'%s'",i+1,Txt))
                s=s.MidStr(6)
                CloseTimeLen=s.StrMatchChrLen(TIMEALP,sig=True)
                Assert(CloseTimeLen==12,RT_String("Line %d CloseTime expecting 12 characters, got %d \n'%s'",i+1,CloseTimeLen,txt))
                Times=Txt
                SubIx=2
            } else if(SubIx==2) { # Text
                Subtitles=(SubLines==0)?Txt:Subtitles+Chr(10)+Txt
                SubLines=SubLines+1
            }
        }
        if(len==0 || i+1>=LINES) {
            Assert(SubIx==2,RT_String("Line %d Expecting %s\n'%s'",i+1,SubIx==0?"Subtitle Number":"Times",Txt))
            RT_DebugF("###########\n%d] %d\n    %s\n   %s\n###########",SubStartLine,SubN,Times,Subtitles)
            If(WRITE_DBASE) {
                RT_DBaseAppend(DB,TimeS,Subtitles)
            }
            SubN=SubN+1
            IN = False
        }
    }
}

Return (!WRITE_DBASE)
    \ ? MessageClip(RT_String("Parse Only\n'%s'",FN))
    \ : MessageClip(RT_String("Parse And Write DBAse\n'%s'\n'%s'",FN,DB))

##############################

Function ChrEatWhite(String S)   {i=1 C=RT_Ord(S,i) While(C==32||C>=8&&C<=13)                        {i=i+1 C=RT_Ord(S,i)} return i>1?MidStr(S,i):S}

# Return extent of string S [ie length from the beginning] that matches any character in Chars set of characters [Default case insignificant]. # StrMatchChrLen("1234.567abcd","0123456789.") = 8
Function StrMatchChrLen(String s,String Chars,Bool "Sig") {
    Function __StrMatchChrLen_LOW(String s,String Chars,int n) { c=s.MidStr(n+1,1) Return(c==""||Chars.FindStr(c)==0) ? n : s.__StrMatchChrLen_LOW(Chars,n+1) }
    Sig=Default(Sig,False)  # Default Case Insignificant
    s=(Sig)?s:s.UCASE   Chars=(Sig)?Chars:Chars.UCASE
    __StrMatchChrLen_LOW(s,Chars,0)
}
2_WriteFixSrt.avs [Write output SRT file].
Code:
# 2_WriteFixSrt.avs

/*
    Requires Avs+, RT_Stats v2.0 Beta 12

    NUMBER OF SUBTITLES MUST MATCH ELSE ABORTS

*/

##################
FN1 =".\Alien_1.srt"           # Same As in 1_ParseAndWriteDBase.avs
FN2 =".\Alien_2.srt"           # Same As in 1_ParseAndWriteDBase.avs
SRT =".\Alien_Out.srt"         # Output Srt file
TIMES_FROM = 1                 # Which DBase to Get Times From
SUBS_FROM  = 2                 # Which DBase to Get Subtitles From
##################

FN1=RT_GetFullPathName(FN1)
FN2=RT_GetFullPathName(FN2)
SRT=RT_GetFullPathName(SRT)
DB1=FN1+".DB"
DB2=FN2+".DB"

###

Records  = RT_DBaseRecords(DB1)
Records2 = RT_DBaseRecords(DB2)
Assert(Records == Records2,RT_String("DBase Subtitle Count MisMatch DB1=%d DB2=%d",Records,Records2))
Assert(1 <= TIMES_FROM <= 2,"1 <= TIMES_FROM <= 2")
Assert(1 <= SUBS_FROM  <= 2,"1 <= SUBS_FROM  <= 2")
TDB = (TIMES_FROM==1) ? DB1 : DB2
SDB = (SUBS_FROM ==1) ? DB1 : DB2
RT_FileDelete(SRT)            # Prep for write
###

for(i=0,Records-1) {
    TimeS = RT_DBaseGetField(TDB,i,0)
    SubS  = RT_DBaseGetField(SDB,i,1)
    RT_WriteFile(SRT,"%d\n%s\n%s\n\n",i+1,TimeS,SubS,Append=True)
}

MessageClip("All Done")
I tried with exact same srt file for each DBase, and wrote output srt file.
Compared output srt and one of the inputs with KDiff and said "Binary Equal", ie exactly same source srt [both input srt were exactly same].
Change the last subtitle time millisecs to "000" in source 1 srt, and a single word in last subtitle text in source 2 file,
and repeated both scripts.
Diff KDiff compare with output and each individual input and KDiff flagged only the deliberate changes in each, so seems pretty spot on if
your subs files are 1:1 exact corresponding.
Will throw error if differing number of subs in each input file.
__________________
I sometimes post sober.
StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace

"Some infinities are bigger than other infinities", but how many of them are infinitely bigger ???

Last edited by StainlessS; 5th May 2020 at 03:51.
StainlessS is offline   Reply With Quote