RE: [AD] Proposal to kill non-UTF-8 support

[ Thread Index | Date Index | More lists.liballeg.org/allegro-developers Archives ]


Eric Botcazou writes:
>> I assumed the user would use UTF-8.
> 
> Yes, this proposal would practically prevent the user from using 16-bit
> Unicode as the text encoding format. 

Not really: it would just force them to convert strings when talking to
Allegro. This isn't always as hard as it might seem, especially in C++.

I've worked on a couple of fairly large GUI apps that freely mixed 
UTF8 and various codepages internally, and just overloaded the string
class once to handle whatever conversions were needed. It took about
4 hours to write the class, after which there was no problem passing
any format of string object directly to a Unicode Windows API function.

Of course, that does make the conversions the problem of the client
rather than the library. Is that is a good thing or not?

> Is it the role of a library, especially a cross-platform one, to 
> impose such a choice ? The answer might be yes, if one thinks that 
> UTF-8 is really better than 16-bit Unicode.

I think UTF8 is better, as it is effectively a superset of both ASCII 
and Unicode, and avoids a lot of the nuisances of using 16 bit text.

As for whether a lib should impose a choice, I think it should try
not to limit what people can do with it, but also, shouldn't go too
far trying to directly support every one of a million options if
it would be possible to support just one.

Otherwise there is a danger of ending up in a situation like the X
protocol with endianess, where every single server and client have
to support both big and little endian transfers, and negotiate the
format when they establish a connection. I'm sure someone once
thought that supporting both could make things faster, but in 
practice it just means lots of redundant code that could have been
avoided if they'd chosen a single format as then only half the 
platforms would have to include a flip option, rather than everything
being able to go in all directions!

Maybe the API and internals and storage could all be UTF8, but 
keep functions like uconvert_toascii() and uconvert_tounicode()?
People using other encodings would then have a choice whether to
do things like:

	char *string = al_uconvert_unicode(string got from Windows API)
	do stuff to string, using UTF8
	al_textprintf(string)

or:

	wchar_t *string = string got from Windows API
	do stuff to string, using Unicode
	al_textprintf(al_uconvert_unicode(string))

Addons could easily provide extra convert routines for codepages
if anyone really wants that, and most Allegro code could entirely
ignore different encoding options. But maybe it's too ugly for
users to have to worry about these issues? On the other hand, all
users doing good locale support will need to be somewhat aware of
text formats anyway, so always doing manual conversions might not
be any harder than any other way of dealing with things.

> The Win32 API speaks 16-bit Unicode, not UTF-8. So an user willing to
> allegro-ize an unicodified Win32 app (very unlikely) or use Allegro in a
> unicodified Win32 program (more likely) may find the conversion
cumbersome.

Windows speaks both 16 bit Unicode and every codepage under the sun,
including sort-of UTF8. The range of choices seems to vary according to
which locale version of Windows you have installed, but is usually quite
wide.


-- 
Shawn



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/