|
Bugzilla – Full Text Bug Listing |
| Summary: | man: new Unicode characters in use | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 11.4 | Reporter: | Jan Engelhardt <jengelh> |
| Component: | Basesystem | Assignee: | Michal Vyskocil <mvyskocil> |
| Status: | VERIFIED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Minor | ||
| Priority: | P3 - Medium | CC: | davejplater, malcolmlewis, werner |
| Version: | Final | ||
| Target Milestone: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | maint:released:11.4:41461 | ||
| Found By: | Beta-Customer | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Bug Depends on: | |||
| Bug Blocks: | 698290 | ||
| Attachments: | Test manpage | ||
|
Description
Jan Engelhardt
2011-03-30 17:50:17 UTC
man uses groff for character mapping and less for output on the terminal That seems to be regression of dropped bnc446710.patch - see bug 446710. However it seems the fonts/devutf8/R is not the place for it anymore. With u2010 24 0 0x002D in that file I've got echo "\[u2010]" | nroff -mandoc -Tutf8 | head -n 1 | od -x 0000000 80e2 0a90 0000004 which is hyphen in utf-8 only ascii seems to produce proper replacement echo "\[u2010]" | nroff -mandoc -Tascii | head -n 1 | od -x 0000000 0a2d 0000004 even if I was not able to realize in which .tmac file is this mapping one. There's no big difference in loaded tmac files between devascii and devutf8. Only in later case the unicode.tmac and latin.tmac are called after tty.tmac. Only one solution I'm aware of is revert the logic of unicode.tmac - instead of current mapping of 0x2d to 0x2010 et all .\" unicode.tmac .\" .char - \[hy] .char ` \[oq] .char ' \[cq] .\" EOF use .\" unicode.tmac .\" .char \[hy] - .char \[oq] ` .char \[cq] ' .\" EOF but that might cause unwanted side-effects in case someone else use non tty output. So maybe we can name it as deunicode.tmac and call it in tty.tmac instead of unicode one. Werner: what do you think? uh forget that - I patched tty.tmac to not include unicode.tmac, which changes the 0x2d to 0x2010. I don't think we need to change it back. I'm going to sent a fix to M17N soon. The problem has been fixed in M17N[1] groff by commit 12 [2]. The tty.tmac no longer include unicode.tmac, so ascii chars will be not replaced. Feel free to test it before I'll submit it to Factory from M17N repository [1]. [1] http://download.opensuse.org/repositories/M17N/openSUSE_11.4/ [2] https://build.opensuse.org/package/rdiff?commit=12&linkrev=base&package=groff&project=M17N I have updated to the package, but still see U+2010 used for wordbreaks. Can you get me an example? Which man page and under which conditions. Thanks. Created attachment 427556 [details]
Test manpage
groff-1.20.1-183.1.x86_64.rpm from M17N/openSUSE_11.4.
$ locale
LANG=en_US.UTF-8
LC_CTYPE=de_DE.UTF-8
LC_NUMERIC=POSIX
LC_TIME=POSIX
LC_COLLATE=POSIX
LC_MONETARY=POSIX
LC_MESSAGES=nb_NO.UTF-8
LC_PAPER=de_DE.UTF-8
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
Running inside xterm-268:
$ man -l test.1 | pcregrep -o '[^\w]+' | sort -u
...
?
When adding | hexdump -C, this will produce "e2 80 90", which is a sign of U+2010.
Updated patch adds the deunicode.tmac, which turns those unicodization off on tty. Then hexdump -C returns 00000000 2d 0a |-.| 00000002 Commited as a revision13 to M17N/groff. Submitted into openSUSE:Factory by request 72760 - I assume you can use the version from M17N, so no maintenance update is requested, thus closing. This is an autogenerated message for OBS integration: This bug (683857) was mentioned in https://build.opensuse.org/request/show/72760 Factory / groff (In reply to comment #2) > That seems to be regression of dropped bnc446710.patch - see bug 446710. > However it seems the fonts/devutf8/R is not the place for it anymore. With > > u2010 24 0 0x002D > > in that file I've got > > echo "\[u2010]" | nroff -mandoc -Tutf8 | head -n 1 | od -x > 0000000 80e2 0a90 > 0000004 > > which is hyphen in utf-8 > > only ascii seems to produce proper replacement > > echo "\[u2010]" | nroff -mandoc -Tascii | head -n 1 | od -x > 0000000 0a2d > 0000004 > > even if I was not able to realize in which .tmac file is this mapping one. > There's no big difference in loaded tmac files between devascii and devutf8. > Only in later case the unicode.tmac and latin.tmac are called after tty.tmac. > > Only one solution I'm aware of is revert the logic of unicode.tmac - instead of > current mapping of 0x2d to 0x2010 et all > > .\" unicode.tmac > .\" > .char - \[hy] > .char ` \[oq] > .char ' \[cq] > .\" EOF > > use > > .\" unicode.tmac > .\" > .char \[hy] - > .char \[oq] ` > .char \[cq] ' > .\" EOF > > but that might cause unwanted side-effects in case someone else use non tty > output. So maybe we can name it as deunicode.tmac and call it in tty.tmac > instead of unicode one. > > Werner: what do you think? I came upon this bug while googling deunicode.tmac due to a new rpmlint error for a few package's man pages. This is from lilv, a package I'm preparing for factory : lilv.x86_64: W: manual-page-warning /usr/share/man/man1/lv2jack.1.gz 69: can't find macro file `deunicode.tmac' lilv.x86_64: W: manual-page-warning /usr/share/man/man1/serdi.1.gz 69: can't find macro file `deunicode.tmac' lilv.x86_64: W: manual-page-warning /usr/share/man/man3/lilv.3.gz 69: can't find macro file `deunicode.tmac' lilv.x86_64: W: manual-page-warning /usr/share/man/man3/SerdURI.3.gz 69: can't find macro file `deunicode.tmac' lilv.x86_64: W: manual-page-warning /usr/share/man/man3/SerdNode.3.gz 69: can't find macro file `deunicode.tmac' lilv.x86_64: W: manual-page-warning /usr/share/man/man1/sordi.1.gz 69: can't find macro file `deunicode.tmac' lilv.x86_64: W: manual-page-warning /usr/share/man/man3/serd.3.gz 69: can't find macro file `deunicode.tmac' lilv.x86_64: W: manual-page-warning /usr/share/man/man3/SerdChunk.3.gz 69: can't find macro file `deunicode.tmac' lilv.x86_64: W: manual-page-warning /usr/share/man/man3/sord.3.gz 69: can't find macro file `deunicode.tmac' lilv.x86_64: W: manual-page-warning /usr/share/man/man1/lv2ls.1.gz 69: can't find macro file `deunicode.tmac' lilv.x86_64: W: manual-page-warning /usr/share/man/man1/lv2info.1.gz 69: can't find macro file `deunicode.tmac' This man page may contain problems that can cause it not to be formatted as intended. Is there a package that provides deunicode.tmac? As of * Mon Jun 06 2011 mvyskocil@suse.cz - - fix bnc#682913: device X100 is missing * create new groff-devx package containing all devX devices, as they need X for build - fix bnc#683857: Unicode characters in use * groff-1.20.1-deunicode.patch adds deunicode.tmac to tty.tmac removes all unecessary unicode characters in tty output I still get 0x2010 as a dash separator. - Sorry, I accidentally tested the groff from 11.3. However the deunicode.tmac is not the proper solution. The working one is simple - change the soft-hyphenation char to - That is what the new version is doing # To be sure I'm testing the right version! $ rpm -q --changelog groff | head -n 4* Wed Jun 08 2011 mvyskocil@suse.cz - fix bnc#683857: Unicode characters in use properly * change the soft hyphenation char to - in tty.tmac $ man -l test.1 | pcregrep -o '[^\w]+' | sort -u | grep -- '-' | hexdump -C 00000000 2d 0a |-.| 00000002 Commited as revision 17 to M17N/groff Now does what was wanted. This is an autogenerated message for OBS integration: This bug (683857) was mentioned in https://build.opensuse.org/request/show/73067 11.4 / groff https://build.opensuse.org/request/show/73070 Factory / groff Update released for: groff, groff-debuginfo, groff-doc Products: openSUSE 11.4 (debug, i586, x86_64) |