Unicode in console

[ Thread Index | Date Index | More lists.tuxfamily.org/slitaz Archives ]


Hi, SliTaz developers!

These days, bad for almost whole SliTaz network, I close to solving
pretty old problem. But I can't finish and say, "yes, I'm right" or
"no, I'm wrong". Seems like hg.slitaz.org is ReadOnly now, and I can't
push my commit. Even, if Hg works, seems like Cooker (cook.slitaz.org)
is completely dead, and not only web interface not working.

Deep in problem.
In SliTaz 3.0 this problem not exists, and it introduced in 4.0 and
still in Rolling.
I can't type my native Cyrillic letters both in pure console and in
any sort of terminals (really tested xterm and sakura). I see only
question marks (?) instead of typing characters. Seems like I can't
type Unicode characters at all? No. Some of them I can, for example:
śûũşī. What a strange problem!

Today I think, why busybox's printf not worked with Unicode, and
simple example not works:
busybox printf '%6s\n' "śśśś"
I want to see two spaces and four unicode letters (right aligned 4
symbols text in 6 symbols wide field). But this printf counts bytes,
not characters. And I address to busybox's config in my local cooking
wok:

$wok/busybox/stuff/busybox-1.21.config

.. . .
# General Configuration
.. . .
CONFIG_LOCALE_SUPPORT=y
CONFIG_UNICODE_SUPPORT=y
# CONFIG_UNICODE_USING_LOCALE is not set
# CONFIG_FEATURE_CHECK_UNICODE_IN_ENV is not set
CONFIG_SUBST_WCHAR=63
CONFIG_LAST_SUPPORTED_WCHAR=767
# CONFIG_UNICODE_COMBINING_WCHARS is not set
# CONFIG_UNICODE_WIDE_WCHARS is not set
# CONFIG_UNICODE_BIDI_SUPPORT is not set
# CONFIG_UNICODE_NEUTRAL_TABLE is not set
# CONFIG_UNICODE_PRESERVE_BROKEN is not set
.. . .

Little surprising, but numbers are decimal. And 63=0x3F (question
mark), and 767=0x2FF.
Fast googling gives me this page:
http://jrgraphix.net/research/unicode_blocks.php

So, all the Unicode symbols in the range 0x0 .. 0x2FF Busybox can
correctly display while input. And all other - no.
What I need? To use Cyrillic I need range 0x400 .. 0x4FF, and, maybe,
0x500 .. 0x52F. All Latin-based and Greek letters are lower than these
ranges.

But, what about these ranges?
1E00 - 1EFF  Latin Extended Additional
1F00 - 1FFF  Greek Extended
2000 - 206F  General Punctuation
and some other? Do we need them?

Question for a PRO. Is it will a big overhead in Busibox if we will define

CONFIG_LAST_SUPPORTED_WCHAR=11263
(this mean range 0x0 .. 0x2BFF)
or, 196608=0x30000 to use whole Unicode?
Do Busybox will generate huge table or so? If it is a question of
size, I suggest to stick to 1279=0x4FF
But I warn you that you can't type smart quotes
http://jrgraphix.net/r/Unicode/2000-206F

Who can try to compile new busybox with this little patch?
BTW, cook's behavior is really annoing when it takes harmless messages like
"ERRORS: 0"
as real ERRORs. And too many false positives in the Busybox's output.
Pascal Bellard pushed too many commits "remove wrong error handler" to
make text "ERROR" as "error" or just to filter it out...

---
I'm not a specialist on busybox, but seems that other solution is to set
CONFIG_UNICODE_USING_LOCALE=y
to use glibc instead of built-in busybox's functions. Do we have all
needed in the Base flavor? Or in the JustX? Or haven't?

Thank you all for attention!
I am interested to know your opinion on this issue!

--
SliTaz GNU/Linux Mailing list - http://www.slitaz.org/


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/