Re: [AD] unicode proposal

[ Thread Index | Date Index | More lists.liballeg.org/allegro-developers Archives ]


On January 21, 2009, Elias Pschernig wrote:
> On Wed, 2009-01-21 at 11:05 -0700, Thomas Fjellstrom wrote:
> > Hmm, right now the string you get from readdir is a byte for byte copy of
> > the string the OS returns. so it aught to work, but may not be
> > displayable in any usefull form.
> >
> > I intend to make sure the functions under windows always use the wide
> > char funcs (16bit unicode), and down convert to utf8. That shouldn't
> > break anything afaik. What might break is if the files are actually in
> > some strange encoding like sjis or whatnot (which they shouldn't be in NT
> > based OSs, but you never know). With linux, it may break files that
> > aren't in the right encoding to begin with. We might be better off not
> > converting at all if we want this to work... But honestly, why do you
> > have mixed filenames? Especially in an allegro app, a very large
> > percentage of applications using allegro will only use files it ships.
>
> Yeah, the only time we need conversion then is in Windows - but that's
> no problem as it's 1:1.
>
> > Do we "fix" the issue by catering to a (very) small minority of users?
> > And then force everyone else to convert from the OS format (UCS2 or
> > ascii, or some strange encoding that they might not be able to detect) to
> > UTF8 (or ascii maybe) every time they interact with the filesystem?
>
> Hm, not sure I follow.. it would be required in Windows, as you could
> not display the filenames otherwise.

I'm not so sure thats the only time. Linux doesn't much care what encoding the 
filenames actually are. And its rather hard to detect properly (if its possible 
at all), see http://search.cpan.org/search?query=Encode%3A%3AGuess&mode=all 
for some code that attempts it, and does a somewhat reasonable job.. I once 
used that module to mostly convert some music tags automatically to utf8 from 
whatever they happened to be (a mix of asian scripts).

the problem then, is converting "sjis" to utf8 without knowing it's sjis will 
corrupt the string beyond recognition.

> > Maybe we can provide a "displayname" corralary to the "al_fs_entry_name"
> > function, the entry keeps two versions, one in UTF8 to hand out to the
> > user to display, and one in whatever format it was given to give back to
> > the os? That might make the ALLEGRO_PATH stuff interesting. It parses and
> > builds its strings using U_CURRENT iirc.
>
> Well, I feel things get a lot simpler if we get rid of all the "current
> encoding" semantics and just say our whole API is UTF-8. That's the
> whole point of this thread :)

Yes indeed. But it doesn't really make it simpler. The code still has to 
support UCS2 (or whatver windows is in), and UTF8.

> --
> Elias Pschernig <elias@xxxxxxxxxx>
>


-- 
Thomas Fjellstrom
tfjellstrom@xxxxxxxxxx




Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/