Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Subtitles

Reply
 
Thread Tools Search this Thread Display Modes
Old 18th August 2020, 19:13   #1181  |  Link
Emulgator
Big Bit Savings Now !
 
Emulgator's Avatar
 
Join Date: Feb 2007
Location: close to the wall
Posts: 1,544
Win10 trying to keylog things ? Muuuhahahaaa...

"It took then several years to make it usable",
yes I am with you regarding both OS XP and 7 and hopefully won't have to do the same again with 10 too soon.

WinXP32ProSP3 and Win7U64SP1 here.
__________________
"To bypass shortcuts and find suffering...is called QUALity" (Die toten Augen von Friedrichshain)
"Data reduction ? Yep, Sir. We're that issue working on. Synce invntoin uf lingöage..."
Emulgator is offline   Reply With Quote
Old 21st August 2020, 15:14   #1182  |  Link
nekrovski
Registered User
 
nekrovski's Avatar
 
Join Date: Oct 2011
Posts: 57
Can anyone help me with Subtitle Edit's regex?
I would like to use Find, to find only double lines that both start with a dash -
nekrovski is offline   Reply With Quote
Old 24th August 2020, 11:39   #1183  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 428
@robena:
SE calls tesseract.exe for each image. Tesseract.exe itself uses multithreading. Running multiple OCR windows with Tesseract will probably use all threads pretty fast.
Using one of the other OCR methods will give better results for you when running in parallel.

>The problem is that with a 3 times faster system, Subtilte Edit is much much slower.
You're taliking about Tesseract 5 vs Tesseract 3? Yes, that's probably correct.


@nekrovski: You can try this:
Code:
 -.+\n-
Nikse555 is offline   Reply With Quote
Old 24th August 2020, 15:12   #1184  |  Link
nekrovski
Registered User
 
nekrovski's Avatar
 
Join Date: Oct 2011
Posts: 57
Quote:
Originally Posted by Nikse555 View Post
@nekrovski: You can try this:
Code:
 -.+\n-
Thanks a lot, works.
nekrovski is offline   Reply With Quote
Old 25th August 2020, 23:14   #1185  |  Link
loninapleton
Registered User
 
Join Date: Jun 2019
Posts: 60
disappearing subtitles

I had a DVD rip which showed a VOB sub. It shows in programs like MKVmerge but won't display in Daum or VLC. Where did it go?
I used Subtitle edit to extract the VOB and save it as SRT for subtitle compatibility.

The workaround I have tried is delete the VOB sub in the
original then recode with Handbrake adding in the SRT. It's coding now.

What makes this so odd is the Text from the VOB sub looked fine and complete as an SRT format viwing it in Notepad++.
loninapleton is offline   Reply With Quote
Old 26th August 2020, 00:43   #1186  |  Link
Janusz
Registered User
 
Join Date: Apr 2020
Location: Poland
Posts: 143
@nekrovski, @Nikse555

Quote:
Originally Posted by nekrovski View Post
Can anyone help me with Subtitle Edit's regex?
I would like to use Find, to find only double lines that both start with a dash -
Quote:
Originally Posted by Nikse555 View Post
@nekrovski: You can try this:
Code:
 -.+\n-
In my opinion, before the expression "-.+\n-" should add "\A". Then for sure "-" will be searched only at the beginning of the line, not in the middle.
The entire expression would be "\A-.+\n-".
__________________
Sorry for my mistakes - I'm using a translator.

Last edited by Janusz; 26th August 2020 at 00:58.
Janusz is offline   Reply With Quote
Old 26th August 2020, 12:51   #1187  |  Link
nekrovski
Registered User
 
nekrovski's Avatar
 
Join Date: Oct 2011
Posts: 57
Quote:
Originally Posted by Janusz View Post
@nekrovski, @Nikse555

In my opinion, before the expression "-.+\n-" should add "\A". Then for sure "-" will be searched only at the beginning of the line, not in the middle.
The entire expression would be "\A-.+\n-".
Thank you.

This is gonna sound super nitpicky but sometimes "break long lines" option, does this to a long line
Code:
Though this trip to Tochigi was pretty far,
too.
The "too" goes to a second line and I really dislike when there's something really long in first line and only a word on another.

Is there a way to prevent this without manually checking in the "fix common errors" window? When there's only a handful of break long lines suggestions, I can check. But when there are 50 or so, it puts a strain on my eyes/brain to check each manually.

So as a workaround to this, after I apply the "break long lines", I'm looking for an option that will let me find/search/display only the 2 lines subtitles in which there's a significant difference between the number of characters in each line. And possibly, for me to be able to specify the difference.

Is there such thing?
nekrovski is offline   Reply With Quote
Old 27th August 2020, 14:39   #1188  |  Link
robena
Registered User
 
Join Date: May 2007
Posts: 29
Quote:
Originally Posted by Nikse555 View Post
@robena:
S
>The problem is that with a 3 times faster system, Subtilte Edit is much much slower.
You're taliking about Tesseract 5 vs Tesseract 3? Yes, that's probably correct.
No, I'm talking about the fact than when OCR is finished, each window on Windows 10 take more than 5 seconds to react when pressing over a sentence needing manual input.

Even TE 3 is slower on Windows 10 by the way, but it's not a big problem it's not a thing I do interactively, I do something else until it's finished.

What is insufferable is on Windows 10:

1) I click in a sentence needing manual correction

2) It may take up to 8 seconds before the windows reacts and I can work.

That happens only when many OCR windows are opened at the same time.

My system has 64GB of memory and 20 threads, I use less than that, so the problem is elsewhere.

That does not happen with other software. I can have 10 Firefox widows opened with 10 tabs each, typing on a tab goes to it instantly.

That does not happen with Windows 7.

Last edited by robena; 27th August 2020 at 14:46.
robena is offline   Reply With Quote
Old 28th August 2020, 09:10   #1189  |  Link
Janusz
Registered User
 
Join Date: Apr 2020
Location: Poland
Posts: 143
@Niksee555
On August 25, 2020 on the main page of the program in the comment to version 3.5.16 @MagratG wrote:
"Query: My temp directory is filling up with 1000s of png files,
the subtitle images, that are not being deleted after closing the program. "


Looking at its directory "temp" I can see that SE automatically creates files with similar names,
eg: a9474388-3f2f-4ae9-b73b-5bff0e0bec39.ass, which also does not delete after exiting the program.

These files are created if "mpv" is selected as the video engine in the program options
and only if the "mpv handles preview text" option is checked.
Sometimes for one and the same inscriptions in quick succession several different files with the same content are created.

****************
Editing 30/08/2020
I checked the 3.5.16 Beta 134 version - the described problem no longer exists.
Thank you.
__________________
Sorry for my mistakes - I'm using a translator.

Last edited by Janusz; 30th August 2020 at 08:44.
Janusz is offline   Reply With Quote
Old 31st August 2020, 17:33   #1190  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 428
@Janusz: Cool, thx for reporting/testing
fixed via this commit: https://github.com/SubtitleEdit/subt...04e063cbcb9ff7
Nikse555 is offline   Reply With Quote
Old 11th September 2020, 00:01   #1191  |  Link
Janusz
Registered User
 
Join Date: Apr 2020
Location: Poland
Posts: 143
@Niksee555
Bug in the stable version of Subtitle Edit 3.5.16 and above.
"OCR auto correction" does not apply to the options you set.



As you can see in the picture - except for the dictionary - the other "OCR auto correction" options are disabled,
and yet the OCR program made 13 corrections, although it should not. All fixes can be seen in the [All Fixes] tab.
The situation described occurs only for italic. See lines 521 and 524 and it always happens
regardless of whether I use pol_OCRFixReplaceList.xml or not.
I checked other texts with and without italics - the problem is with all files.
I also checked the stable version 3.5.15 - the problem does not occur.

The remaining tabs: [Guesses used] and [Unknown words] are filled in as expected.
[Guesses used] is empty and [Unknown words] contains unrecognized words.

For those who do not know Polish, the good news is that
that all corrections were made flawlessly.


Editing 17-09-2020

I checked the version in Subtitle Edit beta 184 - the "All fixes" list is no longer populated for the case described above.

Another problem arose - it concerns the "Subtitle text" window.
In the picture above, with the selected language, the lines detected by OCR without errors completely have a green background for the text,
lines with whole words that are unrecognized have a yellow background, while lines with unrecognized single characters have a brown background.
This property allows you to quickly locate the error line and its type visually. And that's great.
In beta 184, this property is lost, and despite selecting a dictionary in the [Dictionary] field, all text from the first to the last line is white background as if the dictionary was not specified (None).
Background recoloring is only restored when "Fix OCR errors" is checked. Until now, this has worked without having to select this option.
__________________
Sorry for my mistakes - I'm using a translator.

Last edited by Janusz; 19th September 2020 at 11:05.
Janusz is offline   Reply With Quote
Old 11th September 2020, 15:57   #1192  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,564
Was playing with tesseract sources and compiled a x64 build.

No time to test, take it as it is.
Code:
tesseract 5.0.0-alpha-781-gb19e3
 leptonica-1.81.0
  libjpeg 8d (libjpeg-turbo 2.0.5) : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11
 Found AVX
 Found SSE
 Found OpenMP 201511
 Found libarchive 3.4.3 zlib/1.2.11 liblzma/5.2.5 bz2lib/1.0.8 liblz4/1.9.2 libzstd/1.4.5
__________________
@turment on Telegram

Last edited by tormento; 11th September 2020 at 16:00.
tormento is offline   Reply With Quote
Old 12th September 2020, 00:29   #1193  |  Link
loninapleton
Registered User
 
Join Date: Jun 2019
Posts: 60
new OCR screen options

The latest version 3.5.16 which I downloaded just for updates shows a new screen that pops up for OCR which I don't know how to use.

Is it preferable to demux the VOB and avoid this screen rather than trying to drag and drop an MKV which is what I did?
loninapleton is offline   Reply With Quote
Old 19th September 2020, 05:07   #1194  |  Link
loninapleton
Registered User
 
Join Date: Jun 2019
Posts: 60
Quote:
Originally Posted by loninapleton View Post
The latest version 3.5.16 which I downloaded just for updates shows a new screen that pops up for OCR which I don't know how to use.

Is it preferable to demux the VOB and avoid this screen rather than trying to drag and drop an MKV which is what I did?

I am the OP. I have fixed things. Did a fresh install with
translation box un-ticked, Tesseract 5 selected and downloaded
for VOB and English installed as the dictionary language.

Someone can say if ticking the translation box activated that
pop up screen I did not know what to do with. I had an older copy on a different machine and reverted to that-- looking for
differences.
loninapleton is offline   Reply With Quote
Old 23rd September 2020, 22:33   #1195  |  Link
Janusz
Registered User
 
Join Date: Apr 2020
Location: Poland
Posts: 143
@Niksee555

A.
In my opinion, Subtitle Edit is not properly managing the computer's RAM.
I prepared the description for version 3.5.16 beta 222.
The version doesn't matter. I checked previous stable versions 16, 15, 14 all the way to 10.
With the same operations, the results are similar everywhere. But it gets worse from version to version,
so that in version 10 it takes up to 2 GB of RAM during the first loading.

The very launch of the program is ok. The RAM occupancy increases slightly from version to version.
It is known that the program is growing, new functions are added, and this requires space.

The RAM occupancy is based on the Task Manager.
Test file: mpeg-ts contains 1 video stream, 2 audio streams, 1 stream with DVBSUB subtitles
File size: 9,793,003 KB.

After starting, in my case, the program takes up 20.3 MB of RAM,
1. Dropping the ts file on the main program window, the parsing of the file starts.
After its completion, the Import / OCR Vobsub ... window opens. - RAM = 88.4 MB.
In the window I choose [Cancel], I go back to the main window - RAM = 88.4 MB !!! Why?
2. I do the same as in step 1 again.
The RAM occupancy drops to 78 MB so that when the Import / OCR Vobsub ... window opens, it shows RAM = 131.8 MB
I choose [Cancel] again, and the RAM still occupies 139 MB.
If so, I will repeat the operations from point 1, I will eventually take up all RAM.
The program does not release the memory also if I select [OK] and remove the subtitles from the main window by selecting [File / NEW].
In this case, the program takes up RAM even faster.

B.
The second thing is about parsing the file itself. During its digestion, the progress is shown in %.
With each file, I have a situation where the progress counter stops - the numbers stop changing.
During this time, a system message is displayed next to the program name and version (no response) in the program title bar.
At this time, however, the program continues to work because after a shorter or longer time the progress is displayed by a few,
and even several dozen % more. I have one to several such detentions during the file analysis.
The file analysis itself works and completes fine, but these counter stops and messages are annoying.
__________________
Sorry for my mistakes - I'm using a translator.

Last edited by Janusz; 23rd September 2020 at 22:58.
Janusz is offline   Reply With Quote
Old 27th September 2020, 16:57   #1196  |  Link
Janusz
Registered User
 
Join Date: Apr 2020
Location: Poland
Posts: 143
Subtitle Edit beta 232 crashes when trying to import ts file.
The same file in beta 222 opens correctly.
__________________
Sorry for my mistakes - I'm using a translator.

Last edited by Janusz; 27th September 2020 at 17:42.
Janusz is offline   Reply With Quote
Old 28th September 2020, 11:46   #1197  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 428
@Januz: Do you still get the crash in latest beta: https://github.com/SubtitleEdit/subt...leEditBeta.zip ? (beta 240)
Beta also fixes an issue where bd sups lost overlapping subtitles: https://github.com/SubtitleEdit/subt...it/issues/4392
In general, dot net programs do not manage memory release.
Nikse555 is offline   Reply With Quote
Old 28th September 2020, 14:13   #1198  |  Link
Janusz
Registered User
 
Join Date: Apr 2020
Location: Poland
Posts: 143
Quote:
Originally Posted by Nikse555 View Post
@Januz: Do you still get the crash in latest beta 240?
Thank you @Nikse555, beta 240 the earlier file already opens correctly, I also checked a few other ts files - they also open without problem.
__________________
Sorry for my mistakes - I'm using a translator.
Janusz is offline   Reply With Quote
Old 29th September 2020, 11:26   #1199  |  Link
Nikse555
Registered User
 
Join Date: Feb 2004
Location: Mars
Posts: 428
Quote:
Originally Posted by Janusz View Post
Thank you @Nikse555, beta 240 the earlier file already opens correctly, I also checked a few other ts files - they also open without problem.
Cool, thx for testing
Nikse555 is offline   Reply With Quote
Old 30th September 2020, 09:03   #1200  |  Link
von Suppé
Registered User
 
von Suppé's Avatar
 
Join Date: Dec 2013
Posts: 630
Hi Nikse555,

I wouldn't know if this is already been addressed to, but now that I think about it:

Is it possible to load a SUP and/or XML/PNG file into SE, not OCR-ing it, only for adjusting the timecodes? And after that, export back to SUP or XML/PNG, so without changing the original subtitle images and their X/Y coördinates. Of course, preferably with realtime monitoring in the preview with a chosen video.

I would be very happy if that's possible.
Cheers
von Suppé is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 20:31.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.