Measuring per frame runtime of scripts

zorr · 6th January 2019, 01:00

I'm currently working on the VapourSynth port of AvisynthOptimizer.

I have made some progress, I can now output a similarity metric (currently SSIM but I'm eager to try some of the other ones like VMAF and GMSD) of each frame into a file.

What I'd like to do next is measure the time it took to render the frames. This information would be saved into the same file. How can I measure this rendering time?

On AviSynth side I'm using a plugin called AvsTimer with some modifications. I know calling AviSynth plugins is possible from VapourSynth scripts but I think the timer will know nothing about the processing VapourSynth is doing so measuring runtime is not possible with such a method.

Thanks!

DJATOM · 6th January 2019, 02:04

First,

Code:

import time

Then before main get_frame loop

Code:

start = time.perf_counter()

Then in the loop

Code:

frameTime = time.perf_counter()

Now you can do some math with it.

For just simple frame-by-frame time calculation, example below should work

Code:

for i in range(clip.num_frames):
    frameStart = time.perf_counter()
    frameSingle = clip.get_frame(i)
    ...
    frameEnd = time.perf_counter()
    frameTime = frameEnd - frameStart

StainlessS · 6th January 2019, 03:03

DJATOM, is there some way to wait until system timer goes TICK, as in below (so as to START timer from known new tick, like a WaitTick type function),
[purpose to be a little more accurate in timing]

Code:

AVSValue __cdecl RT_LocalTimeString(AVSValue args, void* user_data, IScriptEnvironment* env) {
    const bool file = args[0].AsBool(true);
    SYSTEMTIME  st = { 0 };
    if(file) {
        DWORD tick=GetTickCount();
        while(GetTickCount()==tick) // Wait until system clock goes TICK (prevent two separate calls returning same time)
            Sleep(0);
    }
    GetLocalTime(&st);
    char bf[64];
    if(file) {
        sprintf(bf,"%4d%02d%02d_%02d%02d%02d_%03d",st.wYear,st.wMonth,st.wDay,st.wHour,st.wMinute,st.wSecond,st.wMilliseconds);
    } else {
        sprintf(bf,"%4d-%02d-%02d %02d:%02d:%02d.%03d",st.wYear,st.wMonth,st.wDay,st.wHour,st.wMinute,st.wSecond,st.wMilliseconds);
    }
    return env->SaveString(bf);
}

From here: - https://forum.doom9.org/showthread.p...51#post1861951

EDIT: Better would be to wait High Precision Event Timer tick, if available, and if your code snippet uses HPET (which I presume it does).

EDIT: Maybe something like

Code:

sometime = time.perf_counter()
while( (starttime=time.perf_counter()) == sometime)
    SomeVsThing.sleep(0)    # maybe time.sleep(0), maybe not, no idea
# starttime = starting time after waiting tick

Myrsloik · 6th January 2019, 12:51

If you want per frame processing times it simply doesn't exist and can't be timed well due to threading. Ecen if you do time each frame the numbers will be so irregular you have to average over 100 of them anyway.

DJATOM's script is even worse since it measures performance with a single request. That's always much slower than actual processing speed and doesn't even scale predictably.

Just use the sibgle total time the script took in the end.

StainlessS · 6th January 2019, 13:33

Problem is Zorr wants to time about 10 frames only (repeatedy), so is bound to be a bit off/erratic.

DJATOM · 6th January 2019, 16:10

>DJATOM's script is even worse
Indeed, it's only helpful for measuring frame run time in the serial execution (debug purposes). I thought it was obvious. For async execution just use asyncio, example here: https://github.com/Infiziert90/getna...native.py#L140

zorr · 6th January 2019, 22:41

I'm aware of the challenges in measuring the runtime. AvisynthOptimizer is trying to offset the inaccuracy of timing by rounding the total runtimes to 10ms (and the rounding is adjustable). There's also a verification mode which takes the best results found and runs them again multiple times returning the average, min, max or median of the results. I personally prefer the median in timing measurements because it's pretty much immune to outliers. So the timing measurement doesn't need to be perfect, it's still useful information for the optimizer.

Thanks DJATOM for the code snippets, you helped me on the right track. I want to keep the caller responsible for selecting which frames will be measured (that will be useful in an upcoming feature) so I can't use a loop to iterate all frames. I'm sure your example with Futures would work but I ended up with a simpler model (at least in my non-Pythonian mind):

Code:

frame_times = {}
def startMeasurement(n, clip):
	frame_times[n] = time.perf_counter()
	return clip

video = video.std.FrameEval(functools.partial(startMeasurement, clip=video))

# process the video

def endMeasurement(n, f, clip):
	frame_time = time.perf_counter() - frame_times[n]
	with open(output_file, 'a+') as file:
		file.write(str(frame_time)+'\n')	
	return clip

video = video.std.FrameEval(functools.partial(endMeasurement, clip=video), prop_src=[video])

I'm calling FrameEval before the actual processing starts and store the current time into a dictionary. Python's dictionary is thread-safe so this should work even when multiple frames are processed simultaneously, right? Then in the end I call FrameEval again and calculate the processing time for the frame using the stored start time from the dictionary.

It seems to work but I'm sure it's abusing the system in some horrible way.

For some reason the latter FrameEval needs the prop_src argument or the thing doesn't work. Why is that?

[EDIT] I realized that writing to a file is not thread-safe, so I will have to figure out some other way.

The total runtime of the script is less than the sum of individual frame runtimes, which is to be expected when multiple frames are processed simultaneously. So in the end I agree with Myrsloik, it's better to use the total runtime because well... that's how long the runtime actually is.

ChaosKing · 6th January 2019, 23:51

Are there any advantages if you know all single frame times? Maybe this can be used for temporal filters. The first 1-2 frames are often much slower for denoisers like smdegrain. I would keep it simple and just use the total runtime.

zorr · 7th January 2019, 00:25

Quote:

Originally Posted by ChaosKing

Are there any advantages if you know all single frame times?

No, not really. But the per frame SSIM (or other similarity metric) is useful.

Quote:

Originally Posted by ChaosKing

I would keep it simple and just use the total runtime.

Yes that's the way to go. I'm going to make it so that the script can output only some of the results per frame.

Mystery Keeper · 7th January 2019, 00:51

VapourSynth Editor has Benchmark mode specifically for measuring how fast the script is processed. It just does the processing with no overhead, discarding the resulting frames. Don't know if that's what you need, but feel free to use the code for reference. Keep in mind that depending on the core settings (thread count), processing is done in parallel.

DJATOM · 7th January 2019, 02:43

> I realized that writing to a file is not thread-safe, so I will have to figure out some other way.
You can compose a dictionary or list with time data and then iterate and write it on the last frame. I'm using such approach when I need to get some metrics from a clip.

zorr · 7th January 2019, 23:33

Quote:

Originally Posted by Mystery Keeper

VapourSynth Editor has Benchmark mode specifically for measuring how fast the script is processed. It just does the processing with no overhead, discarding the resulting frames. Don't know if that's what you need, but feel free to use the code for reference. Keep in mind that depending on the core settings (thread count), processing is done in parallel.

Thanks, I looked at it. Is it this part in benchmark_dialog.cpp?

Code:

for(int i = firstFrame; i <= lastFrame; ++i)
	m_pVapourSynthScriptProcessor->requestFrameAsync(i);

I'm going to offload the script running task to VSPipe actually. The optimizer itself is Java so I couldn't use your code anyway.

By the way thanks for VapourSynth Editor, it's really neat!

zorr · 7th January 2019, 23:39

Quote:

Originally Posted by DJATOM

You can compose a dictionary or list with time data and then iterate and write it on the last frame. I'm using such approach when I need to get some metrics from a clip.

That's indeed thread-safe and simple to implement, I like it. The only downside is that you can't get any results until it's all done. It's sometimes nice to see the data when testing something, but then again it's also easy to display the values on the video so it's not a must-have feature.

Someone on StackOverflow recommended using Python's log module which is thread-safe, I wonder if it has any major downsides.

6th January 2019, 01:00	#1 \| Link
zorr Registered User Join Date: Mar 2018 Posts: 447	Measuring per frame runtime of scripts I'm currently working on the VapourSynth port of AvisynthOptimizer. I have made some progress, I can now output a similarity metric (currently SSIM but I'm eager to try some of the other ones like VMAF and GMSD) of each frame into a file. What I'd like to do next is measure the time it took to render the frames. This information would be saved into the same file. How can I measure this rendering time? On AviSynth side I'm using a plugin called AvsTimer with some modifications. I know calling AviSynth plugins is possible from VapourSynth scripts but I think the timer will know nothing about the processing VapourSynth is doing so measuring runtime is not possible with such a method. Thanks!

6th January 2019, 02:04	#2 \| Link
DJATOM Registered User Join Date: Sep 2010 Location: Ukraine, Bohuslav Posts: 377	First, Code: import time Then before main get_frame loop Code: start = time.perf_counter() Then in the loop Code: frameTime = time.perf_counter() Now you can do some math with it. For just simple frame-by-frame time calculation, example below should work Code: for i in range(clip.num_frames): frameStart = time.perf_counter() frameSingle = clip.get_frame(i) ... frameEnd = time.perf_counter() frameTime = frameEnd - frameStart __________________ Me on GitHub PC Specs: Ryzen 5950X, 64 GB RAM, RTX 2070

6th January 2019, 03:03	#3 \| Link
StainlessS HeartlessS Usurer Join Date: Dec 2009 Location: Over the rainbow Posts: 10,980	DJATOM, is there some way to wait until system timer goes TICK, as in below (so as to START timer from known new tick, like a WaitTick type function), [purpose to be a little more accurate in timing] Code: AVSValue __cdecl RT_LocalTimeString(AVSValue args, void* user_data, IScriptEnvironment* env) { const bool file = args[0].AsBool(true); SYSTEMTIME st = { 0 }; if(file) { DWORD tick=GetTickCount(); while(GetTickCount()==tick) // Wait until system clock goes TICK (prevent two separate calls returning same time) Sleep(0); } GetLocalTime(&st); char bf[64]; if(file) { sprintf(bf,"%4d%02d%02d_%02d%02d%02d_%03d",st.wYear,st.wMonth,st.wDay,st.wHour,st.wMinute,st.wSecond,st.wMilliseconds); } else { sprintf(bf,"%4d-%02d-%02d %02d:%02d:%02d.%03d",st.wYear,st.wMonth,st.wDay,st.wHour,st.wMinute,st.wSecond,st.wMilliseconds); } return env->SaveString(bf); } From here: - https://forum.doom9.org/showthread.p...51#post1861951 EDIT: Better would be to wait High Precision Event Timer tick, if available, and if your code snippet uses HPET (which I presume it does). EDIT: Maybe something like Code: sometime = time.perf_counter() while( (starttime=time.perf_counter()) == sometime) SomeVsThing.sleep(0) # maybe time.sleep(0), maybe not, no idea # starttime = starting time after waiting tick __________________ I sometimes post sober. StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace "Some infinities are bigger than other infinities", but how many of them are infinitely bigger ??? Last edited by StainlessS; 6th January 2019 at 04:43.

6th January 2019, 12:51	#4 \| Link
Myrsloik Professional Code Monkey Join Date: Jun 2003 Location: Kinnarps Chair Posts: 2,555	If you want per frame processing times it simply doesn't exist and can't be timed well due to threading. Ecen if you do time each frame the numbers will be so irregular you have to average over 100 of them anyway. DJATOM's script is even worse since it measures performance with a single request. That's always much slower than actual processing speed and doesn't even scale predictably. Just use the sibgle total time the script took in the end. __________________ VapourSynth - proving that scripting languages and video processing isn't dead yet

6th January 2019, 13:33	#5 \| Link
StainlessS HeartlessS Usurer Join Date: Dec 2009 Location: Over the rainbow Posts: 10,980	Problem is Zorr wants to time about 10 frames only (repeatedy), so is bound to be a bit off/erratic. __________________ I sometimes post sober. StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace "Some infinities are bigger than other infinities", but how many of them are infinitely bigger ???

6th January 2019, 16:10	#6 \| Link
DJATOM Registered User Join Date: Sep 2010 Location: Ukraine, Bohuslav Posts: 377	>DJATOM's script is even worse Indeed, it's only helpful for measuring frame run time in the serial execution (debug purposes). I thought it was obvious. For async execution just use asyncio, example here: https://github.com/Infiziert90/getna...native.py#L140 __________________ Me on GitHub PC Specs: Ryzen 5950X, 64 GB RAM, RTX 2070

6th January 2019, 22:41	#7 \| Link
zorr Registered User Join Date: Mar 2018 Posts: 447	I'm aware of the challenges in measuring the runtime. AvisynthOptimizer is trying to offset the inaccuracy of timing by rounding the total runtimes to 10ms (and the rounding is adjustable). There's also a verification mode which takes the best results found and runs them again multiple times returning the average, min, max or median of the results. I personally prefer the median in timing measurements because it's pretty much immune to outliers. So the timing measurement doesn't need to be perfect, it's still useful information for the optimizer. Thanks DJATOM for the code snippets, you helped me on the right track. I want to keep the caller responsible for selecting which frames will be measured (that will be useful in an upcoming feature) so I can't use a loop to iterate all frames. I'm sure your example with Futures would work but I ended up with a simpler model (at least in my non-Pythonian mind): Code: frame_times = {} def startMeasurement(n, clip): frame_times[n] = time.perf_counter() return clip video = video.std.FrameEval(functools.partial(startMeasurement, clip=video)) # process the video def endMeasurement(n, f, clip): frame_time = time.perf_counter() - frame_times[n] with open(output_file, 'a+') as file: file.write(str(frame_time)+'\n') return clip video = video.std.FrameEval(functools.partial(endMeasurement, clip=video), prop_src=[video]) I'm calling FrameEval before the actual processing starts and store the current time into a dictionary. Python's dictionary is thread-safe so this should work even when multiple frames are processed simultaneously, right? Then in the end I call FrameEval again and calculate the processing time for the frame using the stored start time from the dictionary. It seems to work but I'm sure it's abusing the system in some horrible way. For some reason the latter FrameEval needs the prop_src argument or the thing doesn't work. Why is that? [EDIT] I realized that writing to a file is not thread-safe, so I will have to figure out some other way. The total runtime of the script is less than the sum of individual frame runtimes, which is to be expected when multiple frames are processed simultaneously. So in the end I agree with Myrsloik, it's better to use the total runtime because well... that's how long the runtime actually is. Last edited by zorr; 6th January 2019 at 22:58.

6th January 2019, 23:51	#8 \| Link
ChaosKing Registered User Join Date: Dec 2005 Location: Germany Posts: 1,795	Are there any advantages if you know all single frame times? Maybe this can be used for temporal filters. The first 1-2 frames are often much slower for denoisers like smdegrain. I would keep it simple and just use the total runtime. __________________ AVSRepoGUI // VSRepoGUI - Package Manager for AviSynth // VapourSynth VapourSynth Portable FATPACK \|\| VapourSynth Database

7th January 2019, 00:51	#10 \| Link
Mystery Keeper Beyond Kawaii Join Date: Feb 2008 Location: Russia Posts: 724	VapourSynth Editor has Benchmark mode specifically for measuring how fast the script is processed. It just does the processing with no overhead, discarding the resulting frames. Don't know if that's what you need, but feel free to use the code for reference. Keep in mind that depending on the core settings (thread count), processing is done in parallel. __________________ ...desu!

7th January 2019, 02:43	#11 \| Link
DJATOM Registered User Join Date: Sep 2010 Location: Ukraine, Bohuslav Posts: 377	> I realized that writing to a file is not thread-safe, so I will have to figure out some other way. You can compose a dictionary or list with time data and then iterate and write it on the last frame. I'm using such approach when I need to get some metrics from a clip. __________________ Me on GitHub PC Specs: Ryzen 5950X, 64 GB RAM, RTX 2070

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode