Re: [hatari-devel] Character conversion for filenames in GEMDOS HD emulation

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


Please use this corrected version of the patch:

https://gist.github.com/b53a3a3e8b06cdf0881c

This fixes a problem with toupper() in Str_Filename2TOSname() by
preventing that it is called for characters above 0x80 in TOS pathnames.
When the locale has been set those codes were mapped to incorrect
upper-cased characters as the locale is valid for the host only but not
for the character set in the emulated system.


2014-07-23 23:20 GMT+02:00 Max Böhm <mboehm3@xxxxxxxxx>:
> Hi Thomas, hi Eero,
>
> I've addressed your feedback. Here is the updated patch (this time
> based on the current hg version of Hatari):
>
> https://gist.github.com/604f079206d7e5986d26
>
> It can be installed with: patch -p1 <../hatari-hg.gemdos.patch
>
>> Max, I now had a closer look at your patch, and I think it's basically
>> a good approach, but there are some things that I'd like to discuss:
>>
>> 1) I really dislike this part in gemdos.c:
>> #ifdef WIN32
>>  Str_AtariToWindows(pszFileName, pszFileNameHost, INVALID_CHAR);
>> #else
>>  Str_AtariToUtf8(pszFileName, pszFileNameHost);
>> #endif
>> In the end, there is no need to export both functions to other files,
>> so I think it would be better to have a "Str_AtariToHost" and a
>> "Str_HostToAtari" where the implementation in str.c is taking care of
>> the differences instead.
>
> This is now Str_AtariToHost() and Str_HostToAtari().
>
>> 2) The extra step with mapWindowsToUnicode looks cumbersome ... why
>> don't you add a proper mapAtariToWindows table directly instead?
>
> The Windows specific mapping tables are no longer needed. I have removed
> them. The conversion on the Windows platform now uses the standard C
> library functions mbtowc() and wctomb(). Those convert between wide chars
> (unicode code points) and the current locale. So the OS does the work.
> This works, but it does a little more than you would expect. On Windows
> some greek letters are also converted to similar looking latin letters. I
> think this does not hurt. It seems to be a "feature" of these functions.
>
>> 3) Str_AtariToUtf8 can create a destination string that is "longer" than
>> the source, since UTF8 characters can take multiple bytes, right? There
>> seems to be at least one hunk in your patch where you don't take this
>> into account so the destination buffer could overflow.
>
> This has been fixed. There is now always a length parameter.
>
>> 4) What if the (Linux) host sytem does not use a UTF-8 locale? I think
>> there might still be some people around who use some latin-x locale
>> instead.
>
> If you define the macro USE_LOCALE_CHARSET then the conversion
> based on the locale is forced to be used (by default it is used only
> under Windows). It should work in Linux for e.g. "latin-x" locales if they
> are installed. So I haven't used iconv, as mbtowc() and wctomb() already
> provide the required functionality.
>
> For Linux with UTF-8 there are two options. It can use the special
> Atari<->UTF-8 conversion functions (the default) or the locale based
> functions. In general I would prefer using the special UTF-8 functions
> as they work without any dependencies on locales. In my view UTF-8
> is anyway well suited as internal unicode data format.
>
> I have read that the OSX filesystem returns filenames with composed
> characters in a normalized UTF-8 form where the base character and
> the diacritical mark are returned as separate characters (decomposed
> normal form, NFD) while Linux and most other systems use precomposed
> characters (NFC).
> Also in the AtariST charset and in latin-1 are many precomposed
> characters. Therefore I have added support which converts the
> decomposed representation of OSX into its precomposed aquivalent,
> but this works only if you do not define the USE_LOCAL_CHARSET.
> (Although I haven't tested it yet on OSX).
>
> I haven't yet implemented any configuration. For modern Linux, OSX
> and Windows there should be no configuration necessary. Only
> if you also want to support older Linux versions which use a locale
> different to UTF-8 then this would need a configuration option on
> the Hatari command line and/or in the GUI. In that case the
> USE_LOCAL_CHARSET preprocessor macro can just be replaced
> by a boolean configuration variable. The locale to be used is read
> from the LC_ALL environment variable (by a setlocale(LC_ALL, "") call).
>
> Max



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/