Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Subtitles

Reply
 
Thread Tools Search this Thread Display Modes
Old 12th March 2018, 18:59   #1  |  Link
konstantin1
Registered User
 
Join Date: Mar 2014
Posts: 25
Trying to render polish subtitles with ffmpeg, many chars isn't showed properly

I have downloaded a polish subtitle file for a movie, and I am trying to render it to a movie with ffmpeg. The downloaded subtitle file encoding is iso-8859-2, as far as I know. Because when I try to convert it into ass subtitle format, I had to use the ffmpeg option
Code:
-sub_charenc iso-8859-2
After that I can see many accented polish chars in the now utf-8 encoded .ass subtitle file. However some of the accented polish chars don't show up correctly, for example when I open the file with geany text editor, I can see the following:


Even when I try to render the text with ffmpeg
Code:
ass=subtitle_file.ass
option, I get similar results, special polish characters with accents don't appear properly in the rendered text.

What should I do to properly render the subtitles in polish language? Maybe should I use a different (TTF) font? And which (TTF) fonts do support polish language?
konstantin1 is offline   Reply With Quote
Old 13th March 2018, 09:39   #2  |  Link
Midzuki
Unavailable
 
Join Date: Mar 2009
Location: offline
Posts: 1,477
I know nearly-nothing of FOSS fonts, so I will recommend some well-known Windows fonts which support the Polish character set:

Arial, Arial Unicode MS, Times New Roman, Tahoma, Georgia, Trebuchet MS, Verdana, Consolas, Lucida Console.

Regarding the subtitle file itself: you'd better convert it to a UTF-16 SSA ou ASS file with a dedicated subtitle editor, not with ffmpeg.
Midzuki is offline   Reply With Quote
Old 13th March 2018, 10:33   #3  |  Link
Dulus_No
Registered User
 
Join Date: Sep 2016
Posts: 7
https://bboxtype.com/typefaces/FiraGO/
Dulus_No is offline   Reply With Quote
Old 13th March 2018, 10:40   #4  |  Link
Ghitulescu
Registered User
 
Ghitulescu's Avatar
 
Join Date: Mar 2009
Location: Germany
Posts: 5,587
The OP deserves that if he chooses to go astray from well-known and well-implemnted methods.

If any, the subtitle processing software is ahead of all types of video software, so plenty of choices.
__________________
Born in the USB (not USA)
Ghitulescu is offline   Reply With Quote
Old 13th March 2018, 15:54   #5  |  Link
sneaker_ger
Registered User
 
Join Date: Dec 2002
Posts: 5,120
Quote:
Originally Posted by Midzuki View Post
Regarding the subtitle file itself: you'd better convert it to a UTF-16 SSA ou ASS file with a dedicated subtitle editor, not with ffmpeg.
UTF-8 is de-facto ASS standard. (And the most common container for ASS - Matroska - uses UTF-8 for all text as well.)
sneaker_ger is offline   Reply With Quote
Old 13th March 2018, 17:06   #6  |  Link
mkver
Registered User
 
Join Date: May 2016
Posts: 138
1. If your font doesn't support some characters, you get a symbol for unknown glyph (often it is a square). This is not what you see here.
2. Your original subtitle file is probably Windows-1250, not ISO 8859-2. Reason: Look at line 105. The IND is unicode code point 0x84, but it is not even included in ISO 8859-2; it is undefined there. A proper ISO 8859-2 text can't contain IND and (except for the possibility of bugs in the converter) no IND can appear in a file converted from 8859-2 to unicode. The same goes for STS, codepoint 0x93. But if we look at Windows 1250 we see that 0x84 is „ there are and 0x93 is “ which totally fits. So iconv seems to treat undefined things from the input file as a unicode code point and that explains what you see.
(Similarly ST is ś.)
3. Notice that Windows 1250 and ISO 8859-2 do not agree at all positions where both are defined. 0xB9 is š in ISO 8859-2, but ą in Windows 1250. Line 104 contains "patrzą" if it is treated as Windows 1250, but patrzš if treated as ISO 8859-2 as you have. Google Translate doesn't think that patrzš is proper polish (and Google gives just 438 results for a search for it); patrzą is recognized as Polish and gives 20.000.000 search results.

Last edited by mkver; 13th March 2018 at 17:19.
mkver is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 06:55.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2018, vBulletin Solutions Inc.