Some quick performance numbers with RTX 3070 TI & 720x480 clip (noisy source):
tested with vsedit, so number should be slighlty higher bcs of the editors small overhead.
Code:
gmfss_union(clip) # ~14fps
gmfss_union(clip, trt=True) # ~17fps, tensor cache took a long time to build
gmfss_union(clip, num_streams=2) # ~17.5fps
gmfss_union(clip, num_streams=3) # ~18.9fps
gmfss_union(clip, trt=True, num_streams=2) # ~21.5fps
gmfss_union(clip, trt=True, num_streams=3) # ~22.5fps