|
Bugzilla – Full Text Bug Listing |
| Summary: | Unprintable characters in system error messages | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 12.2 | Reporter: | Don Hughes <support> |
| Component: | Basesystem | Assignee: | Stanislav Brabec <sbrabec> |
| Status: | RESOLVED UPSTREAM | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Normal | ||
| Priority: | P5 - None | CC: | jengelh, ohering |
| Version: | Final | ||
| Target Milestone: | --- | ||
| Hardware: | x86-64 | ||
| OS: | openSUSE 12.2 | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
|
Description
Don Hughes
2012-09-17 23:38:40 UTC
Hi,could you please help to have a look at this? I am not sure whether it is right to assign it to you.Feel free to reassign it.Thanks Some application wants to log UTF-8 string ‘badfile’, but one of the layers incorrectly assumes that the application sends string in windows-1252 encoding and ‘encodes’ it to UTF-8 for the second time. This is the result in UTF-8 locale: echo '‘badfile’' | iconv -s -f windows-1252 -t UTF-8 ‘badfile’ Could you provide more information? Which application logs this string? What is setting of your locale in /etc/sysconfig/language? Well, it could also happen, if the application works correctly, but you are viewing the log in an application that uses different encoding of text files and does not understand UTF-8. The example is from doing a system cp command with the output displayed on the console (or captured to a file). Here is my ../sysconfig/language ## Path: System/Environment/Language ## Description: ## Type: string(POSIX,ca_ES.ISO-8859-1,ca_ES.UTF-8,cs_CZ.ISO-8859-2,cs_CZ.UTF-8,da_DE@euro,da_DK.ISO-8859-1,da_DK.UTF-8,de_DE@euro,de_DE.ISO-8859-1,de_DE.UTF-8,el_GR.ISO-8859-7,el_GR.UTF-8,en_GB.ISO-8859-1,en_GB.UTF-8,en_IE@euro,en_IE.ISO-8859-1,en_US.ISO-8859-1,es_ES@euro,es_ES.ISO-8859-1,es_ES.UTF-8,fr_FR@euro,fr_FR.ISO-8859-1,fr_FR.UTF-8,gl_ES@euro,gl_ES.ISO-8859-1,gl_ES.utf-8,hr_HR.ISO-8859-2,hu_HU.ISO-8859-2,hu_HU.UTF-8,it_IT@euro,it_IT.ISO-8859-1,it_IT.UTF-8,ja_JP.eucJP,ja_JP.UTF-8,lt_LT.ISO-8859-13,lt_LT.UTF-8,nl_NL@euro,nl_NL.ISO-8859-1,nl_NL.UTF-8,ru_RU.ISO-8859-5,ru_RU.KOI8-R,ru_RU.UTF-8,sk_SK.ISO-8859-2,sk_SK.UTF-8,tr_TR.ISO-8859-9,tr_TR.UTF-8,ko_KR.eucKR,ko_KR.UTF-8,zh_TW.Big5,zh_TW.UTF-8,zh_CN.GB2312,zh_CN.UTF-8) ## Default: "" ## Config: OpenOffice.org,groff,ispell,kde,kdm,profiles,susehelp,susewm,tetex,wdm # # # Local users will get RC_LANG as their default language, i.e. the # environment variable $LANG . $LANG is the default of all $LC_*-variables, # as long as $LC_ALL is not set, which overrides all $LC_-variables. # Root uses this variable only if ROOT_USES_LANG is set to "yes". # RC_LANG="en_US.UTF-8" ## Type: string ## Default: "" # # This variable will override all LC-variables!! # Again, ROOT_USES_LANG must be set to "yes", if an effect on the superuser # account is desired. # RC_LC_ALL="" ## Type: string ## Default: "" # # This defines the locale in which messages of programs and # libraries with i18n-support should appear if a translated # message catalog for the library or the program is installed. # This also provides localized yes/no answers. # RC_LC_MESSAGES="" ## Type: string ## Default: "" # # This defines the locale for character handling and classification. # The libc uses this value in language dependent function calls, such # as e.g. uppercase/lowercase mapping of foreign characters. # RC_LC_CTYPE="" ## Type: string ## Default: "" # # This defines the locale for sorting strings and characters. # It is used by the libc to obtain the alphabetical order of characters # (e.g. for string comparisons). # RC_LC_COLLATE="" ## Type: string ## Default: "" # # This defines the locale for date and time output formats. # i.e.: 06/09/1999 vs. 09.06.1999 # RC_LC_TIME="" ## Type: string ## Default: "" # # This defines the locale for formatting and reading numbers. # i.e.: 1,234.56 vs. 1.234,56 # RC_LC_NUMERIC="" ## Type: string ## Default: "" # # This defines the locale for formatting and reading money values. # RC_LC_MONETARY="" ## Type: string ## Default: "" # # This defines the locale for format of paper. # RC_LC_PAPER="" ## Type: string(ctype) ## Default: ctype # # This defines if the user "root" should use the locale settings # which are defined here. # Value "ctype" means that root uses just LC_CTYPE. # ROOT_USES_LANG="yes" ## Type: yesno ## Default: no # # Workaround for missing forward of LANG and LC variables # of e.g. ssh login connections. # AUTO_DETECT_UTF8="no" ## Type: string ## Default: "" # # List of installed language supports, use by YaST2 # INSTALLED_LANGUAGES="en_US" OK. Your system uses standard English UTF-8 locale. Your system cp command prints following UTF-8 string: cp: cannot stat ‘badfile’: No such file or directory Which console are you using? Which text encoding uses your console? It seems that your console uses windows-1252 locale, and not UTF-8. I am not sure what you mean by what console I am using. I have a monitor connected to the local video card and am using the default text mode console that starts on F1. This stuff is pretty much default, and is pretty much the same way that it has been configured for the last 6 or 7 releases. As shown above, I have CONSOLE_ENCODING="UTF-8" in the /etc/sysconfig/console file. I do notice in my env list an entry G_FILENAME_ENCODING=@local,UTF-8,ISO-889-15,CP1252 that seems to be set by zzz-glib2.sh in the profile.d directory. OK. So you are using a local linux virtual text console, and not using a remote ssh console e. g. under Windows. Do you have the same problem in the graphical terminal emulator? Login to KDE/GNOME/XFCE or anything else, run there terminal emulator, verify that the terminal is in UTF-8 mode (e. g. in menu Terminal in gnome-terminal), and display the affected text. If it is not OK there as well, it is a problem of particular application, and I would need to know, which application tried to copy file named "badfile". The string itself probably comes from coreutils cp, but the cp itself prints to console, only if it is called from another application, then it can redirect its output to the syslog. If it is OK there, then it is a problem of virtual console setting. Notes: G_FILENAME_ENCODING has a different meaning: Applications using GLib can display local file names in multiple encodings: If the file name is not valid in locale default encoding, it tries to interpret is as other encodings in the list. The problem could exist in past as well, but it was hidden, as coreutils in 11.4 used quotes `' (pure ASCII characters), and now uses ‘’ (UNICODE General Punctuation U+2018 LEFT SINGLE QUOTATION MARK and U+2019 RIGHT SINGLE QUOTATION MARK). Displays correctly in an xterm window. I am just typing 'cp' at the command prompt, so I guess that the calling program would be /bin/bash Now I understand, you mean "system error" printed on the console, not the "system error" in the system log. OK. It looks like a virtual console problem that did not properly switch to UTF-8. The reason for unprintable characters is easy: lat9w-16.psfu does not contain general punctuation marks. This is valid for most other (if not all) fonts. Fixing is not simple: Either hack UNICODE maps to map quotes to ASCII quotes, or add these characters to spare area (if ther is any) of these fonts. I assume that you are referring to the CONSOLE_FONT entry in /etc/sysconfig/console. That was supplied but the install program, and I have no desire to make changes to it. What should I change it to? Currently there is no straight fix. It is needed to extent fonts or UNICODE maps to include general punctuation quotes. Any UNICODE character going to be written to the virtual console is translated using UNICODE map to the one of (256, 512) characters available in the console font. All other characters are mapped to invalid character (e. g. question mark in inverse colors). Our console fonts have only 256 positions. And 256 characters already used. There is no way to add characters in a compatible way. We can just modify UNICODE maps of all fonts and map all "“”„‟ to ", all '‘’‚‛ to ' and all -‐‑‒–—―⁃− to -. There is no usable character to display ellipsis (…). *** Bug 904252 has been marked as a duplicate of this bug. *** Bug 932616 discovered new affected letter: BLACK CIRCLE (U+25CF) As the bug itself is trivial and easy to make a partial fix (for one locale, one font in one site), fixing it completely will need a lot of work: Our fonts come from early age of 8-bit displays. Nearly none of them contain extended characters. These fonts were designed in age of MS-DOS using commercial or freeware editors. I am not aware of any editor for Linux. Hopefully at least psftools exist. It makes possible to edit fonts in the text editor. Current status: Some of kbd fonts (but not all) contain Unicode character map. But none of them render or map interesting and often used characters. So what will be needed to get this fixed: 1. Get psftools: https://build.opensuse.org/package/show/graphics/psftools 2. Disassemble all fonts. 3. Review all fonts one by one. Request (upstream) deletion of low quality fonts implementing only subset of the selected map. 4. Draw requested characters or create a smart map. 5. Review all Unicode maps bundled with kbd. 6. Either remove the old Unicode map from all fonts, or embed a good Unicode map. 7. Compile all fonts again. 8. Send new files to the upstream. Note that some older fonts may need a different tool to decompose them. Nearly all of this work could be done even by a non-programmer, but it will be a time consuming work. If anybody will volunteer with the font review work and discuss it with the upstream, I can prepare all needed tools or files. *** Bug 932616 has been marked as a duplicate of this bug. *** However, VT not properly switching to UTF8 is a problem separate from missing U+XXXX->glyph positon mappings in *fonts*. VT not properly switching to UTF-8 is reported as bug 904214 and it is not a problem of kbd itself. This is partially fixed in later releases systemd where it sets unicode properly. For the fonts it is work for the upstream and as such should be reported there, or simply done by somebody interested as we do not have resources for this bug. |