Bug 780862

Summary: Unprintable characters in system error messages
Product: [openSUSE] openSUSE 12.2 Reporter: Don Hughes <support>
Component: BasesystemAssignee: Stanislav Brabec <sbrabec>
Status: RESOLVED UPSTREAM QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: jengelh, ohering
Version: Final   
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE 12.2   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Don Hughes 2012-09-17 23:38:40 UTC
User-Agent:       Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20100101 Firefox/15.0.1

System upgraded from 11.4 to 12.2 and since the upgrade system error messages have unprintable characters as in the following example:

cp: cannot stat ‘badfile’: No such file or directory


Contents of /etc/sysconfig/console:

## /etc/sysconfig/console  (MT)
#

## Path:        Hardware/Console
## Description: Text console settings (see also Hardware/Keyboard)
## Type:        string
## Default:     ""
## ServiceRestart: kbd
#
# Console settings.
# Note: The KBD_TTY setting from Hardware/Keyboard (sysconfig/keyboard)
# also applies for the settings here.
#
# Load this console font on bootup:
# (/usr/share/kbd/consolefonts/)
#
CONSOLE_FONT="lat9w-16.psfu"

## Type:        string
## Default:     ""
#
# Some fonts come without a unicode map.
# (.psfu fonts supposedly have it, others often not.)
# You can then specify the unicode mapping of your font
# explicitly. (/usr/share/kbd/unimaps/)
# Normally not needed.
#
CONSOLE_UNICODEMAP=""

## Type:        string
## Default:     ""
#
# Most programs output 8 bit characters, so you need a table to
# translate those characters into unicode. That one can be specified
# here. (/usr/share/kbd/consoletrans/)
# (Note: If your console is in utf-8 mode you don't need this.)
# If your code does not use a unicode mapping at all (because you
# e.g. explicitly specified UNICODEMAP="none") you may circumvent
# the translation via unicode, but load a map which directly maps
# 8 bit output of your program to a font position.
#
CONSOLE_SCREENMAP="trivial"

## Type:        string
## Default:     ""
#
# for some fonts the console has to be initialized with CONSOLE_MAGIC.
# CONSOLE_MAGIC can be empty or have the values "(B", ")B", "(K" or ")K".
# Normally not needed (automatically handled by setfont).
#
CONSOLE_MAGIC="(K"

## Path:        System/Console/Framebuffer
## Description: Framebuffer configuration
## Type:        string
## Default:     ""
#
# You may want to load a framebuffer display driver into your kernel
# in order to be able to change graphics modes etc. with fbset in
# console mode.
#
# Notes: Most people won't enter anything here, as:
#   * it won't work if you have vesafb already active
#   * its advantageous to have fb support compiled into your kernel
#   * Some XFree86 drivers (especially in XFree86-4.x) don't work
#     too well, if you enable framebuffer text mode.
#
# Example:
#  FB_MODULES="matroxfb_base vesa=0x182 fv=85 matroxfb_maven matroxfb_crtc2"
#
FB_MODULES=""

## Type:        string
## Default:     ""
#
# In case your kernel has framebuffer support (or you loaded the framebuffer
# support into your kernel as a module above), you may want to change the
# resolution or other parameters. This is done by secifying the parameters
# to fbset. Use a mode from /etc/fb-modes and additional parameters as
# -a, -depth <BPP>, -vyres <VYRES>, ... (See fbset manpage and/or fbset -h).
#
# Notes:
#   * vesafb does not (currently) support changing the display mode
#   * BEWARE! Don't set modes your monitor can't do. Watch out for the maximum
#     horizontal frequency. Old monitors might even be damaged if you exceed
#     their capabilities.
#
# Example:
#   FBSET_PARAMS="-a -depth 16 768x576-90 -vyres 10240"
#
FBSET_PARAMS=""

# Encoding used for output of non-ascii characters.
#
CONSOLE_ENCODING="UTF-8"




Reproducible: Always

Steps to Reproduce:
1.
2.
3.
Comment 1 Kun Kun Zhang 2012-09-19 10:24:14 UTC
Hi,could you please help to have a look at this? I am not sure whether it is right to assign it to you.Feel free to reassign it.Thanks
Comment 2 Stanislav Brabec 2013-02-06 14:52:39 UTC
Some application wants to log UTF-8 string ‘badfile’, but one of the layers incorrectly assumes that the application sends string in windows-1252 encoding and ‘encodes’ it to UTF-8 for the second time.

This is the result in UTF-8 locale:

echo '‘badfile’' | iconv -s -f windows-1252 -t UTF-8
‘badfile’

Could you provide more information? Which application logs this string? What is setting of your locale in /etc/sysconfig/language?
Comment 3 Stanislav Brabec 2013-02-06 14:58:51 UTC
Well, it could also happen, if the application works correctly, but you are viewing the log in an application that uses different encoding of text files and does not understand UTF-8.
Comment 4 Don Hughes 2013-02-07 02:15:57 UTC
The example is from doing a system cp command with the output displayed on the console (or captured to a file).

Here is my ../sysconfig/language

## Path:	System/Environment/Language
## Description:	
## Type:	string(POSIX,ca_ES.ISO-8859-1,ca_ES.UTF-8,cs_CZ.ISO-8859-2,cs_CZ.UTF-8,da_DE@euro,da_DK.ISO-8859-1,da_DK.UTF-8,de_DE@euro,de_DE.ISO-8859-1,de_DE.UTF-8,el_GR.ISO-8859-7,el_GR.UTF-8,en_GB.ISO-8859-1,en_GB.UTF-8,en_IE@euro,en_IE.ISO-8859-1,en_US.ISO-8859-1,es_ES@euro,es_ES.ISO-8859-1,es_ES.UTF-8,fr_FR@euro,fr_FR.ISO-8859-1,fr_FR.UTF-8,gl_ES@euro,gl_ES.ISO-8859-1,gl_ES.utf-8,hr_HR.ISO-8859-2,hu_HU.ISO-8859-2,hu_HU.UTF-8,it_IT@euro,it_IT.ISO-8859-1,it_IT.UTF-8,ja_JP.eucJP,ja_JP.UTF-8,lt_LT.ISO-8859-13,lt_LT.UTF-8,nl_NL@euro,nl_NL.ISO-8859-1,nl_NL.UTF-8,ru_RU.ISO-8859-5,ru_RU.KOI8-R,ru_RU.UTF-8,sk_SK.ISO-8859-2,sk_SK.UTF-8,tr_TR.ISO-8859-9,tr_TR.UTF-8,ko_KR.eucKR,ko_KR.UTF-8,zh_TW.Big5,zh_TW.UTF-8,zh_CN.GB2312,zh_CN.UTF-8)
## Default:	""
## Config:      OpenOffice.org,groff,ispell,kde,kdm,profiles,susehelp,susewm,tetex,wdm
#
#
# Local users will get RC_LANG as their default language, i.e. the
# environment variable $LANG . $LANG is the default of all $LC_*-variables,
# as long as $LC_ALL is not set, which overrides all $LC_-variables.
# Root uses this variable only if ROOT_USES_LANG is set to "yes".
#
RC_LANG="en_US.UTF-8"

## Type:	string
## Default:	""
#
# This variable will override all LC-variables!!
# Again, ROOT_USES_LANG must be set to "yes", if an effect on the superuser
# account is desired.
#
RC_LC_ALL=""

## Type:	string
## Default:	""
#
# This defines the locale in which messages of programs and
# libraries with i18n-support should appear if a translated
# message catalog for the library or the program is installed.
# This also provides localized yes/no answers.
#
RC_LC_MESSAGES=""

## Type:	string
## Default:	""
#
# This defines the locale for character handling and classification.
# The libc uses this value in language dependent function calls, such
# as e.g. uppercase/lowercase mapping of foreign characters.
#
RC_LC_CTYPE=""

## Type:	string
## Default:	""
#
# This defines the locale for sorting strings and characters.
# It is used by the libc to obtain the alphabetical order of characters
# (e.g. for string comparisons).
#
RC_LC_COLLATE=""

## Type:	string
## Default:	""
#
# This defines the locale for date and time output formats.
# i.e.: 06/09/1999 vs. 09.06.1999
#
RC_LC_TIME=""

## Type:	string
## Default:	""
#
# This defines the locale for formatting and reading numbers.
# i.e.: 1,234.56 vs. 1.234,56
#
RC_LC_NUMERIC=""

## Type:	string
## Default:	""
#
# This defines the locale for formatting and reading money values.
#
RC_LC_MONETARY=""

## Type:	string
## Default:	""
#
# This defines the locale for format of paper.
#
RC_LC_PAPER=""

## Type:	string(ctype)
## Default:	ctype
#
# This defines if the user "root" should use the locale settings
# which are defined here.
# Value "ctype" means that root uses just LC_CTYPE.
#
ROOT_USES_LANG="yes"

## Type:        yesno
## Default:     no
#
# Workaround for missing forward of LANG and LC variables
# of e.g. ssh login connections.
#
AUTO_DETECT_UTF8="no"

## Type:        string
## Default:     ""
#
# List of installed language supports, use by YaST2
#
INSTALLED_LANGUAGES="en_US"
Comment 5 Stanislav Brabec 2013-02-08 14:15:47 UTC
OK. Your system uses standard English UTF-8 locale. Your system cp command prints following UTF-8 string:

cp: cannot stat ‘badfile’: No such file or directory

Which console are you using? Which text encoding uses your console? It seems that your console uses windows-1252 locale, and not UTF-8.
Comment 6 Don Hughes 2013-02-09 00:03:04 UTC
I am not sure what you mean by what console I am using.  I have a monitor connected to the local video card and am using the default text mode console that starts on F1.  This stuff is pretty much default, and is pretty much the same way that it has been configured for the last 6 or 7 releases.  As shown above, I have CONSOLE_ENCODING="UTF-8" in the /etc/sysconfig/console file.

I do notice in my env list an entry G_FILENAME_ENCODING=@local,UTF-8,ISO-889-15,CP1252 that seems to be set by zzz-glib2.sh in the profile.d directory.
Comment 7 Stanislav Brabec 2013-02-11 15:11:55 UTC
OK. So you are using a local linux virtual text console, and not using a remote ssh console e. g. under Windows.

Do you have the same problem in the graphical terminal emulator? Login to KDE/GNOME/XFCE or anything else, run there terminal emulator, verify that the terminal is in UTF-8 mode (e. g. in menu Terminal in gnome-terminal), and display the affected text.

If it is not OK there as well, it is a problem of particular application, and I would need to know, which application tried to copy file named "badfile". The string itself probably comes from coreutils cp, but the cp itself prints to console, only if it is called from another application, then it can redirect its output to the syslog.

If it is OK there, then it is a problem of virtual console setting.

Notes:

G_FILENAME_ENCODING has a different meaning: Applications using GLib can display local file names in multiple encodings: If the file name is not valid in locale default encoding, it tries to interpret is as other encodings in the list.

The problem could exist in past as well, but it was hidden, as coreutils in 11.4 used quotes `' (pure ASCII characters), and now uses ‘’ (UNICODE General Punctuation U+2018 LEFT SINGLE QUOTATION MARK and U+2019 RIGHT SINGLE QUOTATION MARK).
Comment 8 Don Hughes 2013-02-11 15:47:23 UTC
Displays correctly in an xterm window.

I am just typing 'cp' at the command prompt, so I guess that the calling program would be /bin/bash
Comment 9 Stanislav Brabec 2013-02-11 16:02:37 UTC
Now I understand, you mean "system error" printed on the console, not the "system error" in the system log.

OK. It looks like a virtual console problem that did not properly switch to UTF-8.
Comment 10 Stanislav Brabec 2013-02-12 19:17:18 UTC
The reason for unprintable characters is easy: lat9w-16.psfu does not contain general punctuation marks. This is valid for most other (if not all) fonts.

Fixing is not simple: Either hack UNICODE maps to map quotes to ASCII quotes, or add these characters to spare area (if ther is any) of these fonts.
Comment 11 Don Hughes 2013-02-12 21:10:04 UTC
I assume that you are referring to the CONSOLE_FONT entry in /etc/sysconfig/console.  That was supplied but the install program, and I have no desire to make changes to it.  What should I change it to?
Comment 12 Stanislav Brabec 2013-02-13 18:59:45 UTC
Currently there is no straight fix.

It is needed to extent fonts or UNICODE maps to include general punctuation quotes.

Any UNICODE character going to be written to the virtual console is translated using UNICODE map to the one of (256, 512) characters available in the console font. All other characters are mapped to invalid character (e. g. question mark in inverse colors).

Our console fonts have only 256 positions. And 256 characters already used. There is no way to add characters in a compatible way.

We can just modify UNICODE maps of all fonts and map all "“”„‟ to ", all '‘’‚‛ to ' and all -‐‑‒–—―⁃− to -. There is no usable character to display ellipsis (…).
Comment 13 Stanislav Brabec 2014-11-06 19:50:56 UTC
*** Bug 904252 has been marked as a duplicate of this bug. ***
Comment 14 Stanislav Brabec 2015-05-28 14:27:37 UTC
Bug 932616 discovered new affected letter: BLACK CIRCLE (U+25CF)

As the bug itself is trivial and easy to make a partial fix (for one locale, one font in one site), fixing it completely will need a lot of work:

Our fonts come from early age of 8-bit displays. Nearly none of them contain extended characters. These fonts were designed in age of MS-DOS using commercial or freeware editors.

I am not aware of any editor for Linux. Hopefully at least psftools exist. It makes possible to edit fonts in the text editor.


Current status:

Some of kbd fonts (but not all) contain Unicode character map.

But none of them render or map interesting and often used characters.

So what will be needed to get this fixed:

1. Get psftools: https://build.opensuse.org/package/show/graphics/psftools

2. Disassemble all fonts.

3. Review all fonts one by one. Request (upstream) deletion of low quality fonts implementing only subset of the selected map.

4. Draw requested characters or create a smart map.

5. Review all Unicode maps bundled with kbd.

6. Either remove the old Unicode map from all fonts, or embed a good Unicode map.

7. Compile all fonts again.

8. Send new files to the upstream.

Note that some older fonts may need a different tool to decompose them.


Nearly all of this work could be done even by a non-programmer, but it will be a time consuming work.

If anybody will volunteer with the font review work and discuss it with the upstream, I can prepare all needed tools or files.
Comment 15 Stanislav Brabec 2015-05-28 14:28:03 UTC
*** Bug 932616 has been marked as a duplicate of this bug. ***
Comment 16 Jan Engelhardt 2015-05-28 14:49:28 UTC
However, VT not properly switching to UTF8 is a problem separate from missing U+XXXX->glyph positon mappings in *fonts*.
Comment 17 Stanislav Brabec 2015-05-28 15:11:58 UTC
VT not properly switching to UTF-8 is reported as bug 904214 and it is not a problem of kbd itself.
Comment 18 Tomáš Chvátal 2017-08-11 15:48:35 UTC
This is partially fixed in later releases systemd where it sets unicode properly.

For the fonts it is work for the upstream and as such should be reported there, or simply done by somebody interested as we do not have resources for this bug.