Re: [hatari-devel] Character conversion for filenames in GEMDOS HD emulation

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


Hi Eero,

> File name encoding doesn't actually come from the system, but from
> the file system.  You may mount disks (memory cards, CDs etc) that
> have different file name encodings than your system.  Whether file
> names show up correctly depends on whether they match your system
> locale charmap, or whether you gave correct file name encoding on
> the mount options when you do the mounting manually (so that kernel
> does the conversion correctly).
> 
> Linux distros disk automounts nowadays default to UTF-8, but that
> doesn't mean that strings you get are valid UTF-8.
> 
> Anything related to locale is a rat's nest.

Indeed. Good points.

> Here's how locale's character set is detected on Linux:
> http://stackoverflow.com/questions/1492918/how-do-you-get-what-kind-of-
> encoding-your-system-uses-in-c-c
> 
> Something similar would be needed also for Windows.
> 
> With which version and locale your Windows your change is tested with?

Windows 7 with cp1252 (aka Windows ANSI)

> I kind of doubt modern Windows versions being stuck to fixed size
> 8-bit (cp1252 or other) encodings for file names.  It might use
> UCS-2/UTF-16 or UTF-8...?

You're right. I did a short research on the web. Windows NTFS stores all filenames internally in Unicode (which is good). But if you use fopen() it interprets filenames according to the codepage of the system (cp1252).

But there seems to be a much better way: Windows also provides unicode versions of the standard library functions, e.g. _wfopen(). These functions take UTF-16 strings as parameters (which I can easily produce with the code I already have). When using these functions no codepage/locale handling needs to be done at all. Unicode aware Windows applications (like Explorer) will see all filenames with all AtariST characters correctly (similar to Linux with utf-8).

> At run-time iconv() is nicer alternative, it's already part
> of glibc, so it doesn't add library dependency, and API looks
> much nicer.  You give from/to encodings to iconv_open() and
> the input & output buffers to iconv() calls.
> 
> The issue with iconv() is that at least quick listing of
> encodings it supports didn't seeme to have an Atari coding,
> hopefully I'm wrong.  And I'm not sure what one would use
> on Windows.

I'll have a look at iconv and will see if I can use it under Linux when host locales different to utf-8 are required. Is iconv always available on all Hatari platforms? 

Max



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/