Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
|
|
Thread Tools | Search this Thread | Display Modes |
12th March 2018, 18:59 | #1 | Link |
Registered User
Join Date: Mar 2014
Posts: 38
|
Trying to render polish subtitles with ffmpeg, many chars isn't showed properly
I have downloaded a polish subtitle file for a movie, and I am trying to render it to a movie with ffmpeg. The downloaded subtitle file encoding is iso-8859-2, as far as I know. Because when I try to convert it into ass subtitle format, I had to use the ffmpeg option
Code:
-sub_charenc iso-8859-2 Even when I try to render the text with ffmpeg Code:
ass=subtitle_file.ass What should I do to properly render the subtitles in polish language? Maybe should I use a different (TTF) font? And which (TTF) fonts do support polish language? |
13th March 2018, 09:39 | #2 | Link |
Unavailable
Join Date: Mar 2009
Location: offline
Posts: 1,480
|
I know nearly-nothing of FOSS fonts, so I will recommend some well-known Windows fonts which support the Polish character set:
Arial, Arial Unicode MS, Times New Roman, Tahoma, Georgia, Trebuchet MS, Verdana, Consolas, Lucida Console. Regarding the subtitle file itself: you'd better convert it to a UTF-16 SSA ou ASS file with a dedicated subtitle editor, not with ffmpeg. |
13th March 2018, 10:40 | #4 | Link |
Registered User
Join Date: Mar 2009
Location: Germany
Posts: 5,769
|
The OP deserves that if he chooses to go astray from well-known and well-implemnted methods.
If any, the subtitle processing software is ahead of all types of video software, so plenty of choices.
__________________
Born in the USB (not USA) |
13th March 2018, 17:06 | #6 | Link |
Registered User
Join Date: May 2016
Posts: 197
|
1. If your font doesn't support some characters, you get a symbol for unknown glyph (often it is a square). This is not what you see here.
2. Your original subtitle file is probably Windows-1250, not ISO 8859-2. Reason: Look at line 105. The IND is unicode code point 0x84, but it is not even included in ISO 8859-2; it is undefined there. A proper ISO 8859-2 text can't contain IND and (except for the possibility of bugs in the converter) no IND can appear in a file converted from 8859-2 to unicode. The same goes for STS, codepoint 0x93. But if we look at Windows 1250 we see that 0x84 is „ there are and 0x93 is “ which totally fits. So iconv seems to treat undefined things from the input file as a unicode code point and that explains what you see. (Similarly ST is ś.) 3. Notice that Windows 1250 and ISO 8859-2 do not agree at all positions where both are defined. 0xB9 is š in ISO 8859-2, but ą in Windows 1250. Line 104 contains "patrzą" if it is treated as Windows 1250, but patrzš if treated as ISO 8859-2 as you have. Google Translate doesn't think that patrzš is proper polish (and Google gives just 438 results for a search for it); patrzą is recognized as Polish and gives 20.000.000 search results. Last edited by mkver; 13th March 2018 at 17:19. |
Thread Tools | Search this Thread |
Display Modes | |
|
|