Re: [hatari-devel] GEMDOS filename handling

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


Hi,

On lauantai 26 heinäkuu 2014, Max Böhm wrote:
> I have validated the patch on Windows/Linux/OSX as described below.
> During the validation I found that still an aspect of the OSX related
> UTF-8 conversion was incorrect and therefore have updated again my
> patch (sorry for so many updates, this is hopefully the final version
> now!):
> 
> GEMDOS TOS name <-> UTF-8 conversion patch:
> https://gist.github.com/2610cd9df21dc827fe45

Thanks, the patch looks good except for few trivial things:
- most of the new functions are used only in str.c so they
  could be static
- Str_HostToAtari() call could be after string has been
  clipped to 8+3 chars and replace "strcpy(dst, src);"

INVALID_CHAR matching change needs to be separate, see bottom of mail.


> Details on the validation
> -------------------------
> I've created a test program to be run in the emulated system which
> first creates one file for each character code 32..255 (using Fcreate).
>
> In a second step, after you press a key, it deletes the files (using
> Fsfirst and Fdelete). It writes a logfile "atari.log" of the form:
> 
> hexcode - 'filename' - <status of Fcreate>
> ...
> hexcode - 'filename' - <status of Fsfirst>, <status of Fdelete>
> 
> for example:
> ...
> 80 - 'L:\TEST\80_Ç.TXT' - created
> 81 - 'L:\TEST\81_ü.TXT' - created
> ...
> 80 - 'L:\TEST\80_Ç.TXT' - found, deleted
> 81 - 'L:\TEST\81_ü.TXT' - found, deleted
> ...
> 
> You find the source code of the test program here:
> https://gist.github.com/6c7c4340e2a656b2066b

Good, this really needed a test-case.

 
> Then I've run this program on the hg Hatari version (to which I have
> applied my patch) on the platforms Windows 7, Ubuntu 14.04, and OSX
> 10.6.
> 
> The atari.log file created by the test program within the emulated
> system uses the AtariST character map. To make it readable on the
> host I've converted it to utf8 using the recode utility:
> 
> recode AtariST..UTF-8 <atari.log >atari.log-utf8
> 
> On each platform between the two steps I listed the contents of
> the TEST directory on the host.
> 
> Windows: dir /B /O:N TEST >windows.log
> Linux:   ls TEST >linux.log
> OSX:     ls TEST >osx.log
> 
> The results of the validation for each platform (atari.log and host.log)
> can be found here:
> 
> Windows: https://gist.github.com/5b0c1f311860829fb04f
> Linux:   https://gist.github.com/dd23685a872ebd1c544b
> OSX:     https://gist.github.com/bee6f92b11fe695e9430
> 
> This shows that the character mapping works as expected.

This is good start, but the main point of emulation is to match
the behavior of emulated system.  You need to repeat the same test
also for different TOS versions [1] on a floppy or HD image.
Could you do that testing too?

If behaviour differs between TOS versions, we need to think
whether differences should be emulated or whether we just pick
the safest behavior.

[1] Latest EmuTOS, TOS 1.0x, TOS 1.6x, TOS 2.0x, TOS 3.x, TOS 4.x.

[2] number of files on FAT file system root dir is limited
    (and max path+file name length is 255), so run test in
    a sub directory.


> ===================================
> 
> During the testing I noticed a few other things in the GEMDOS layer:
> 
> I noticed that the GEMDOS emulation layer can create files on the host
> with certain characters in the pathname which cannot be found (by
> Fsfirst) or deleted (by Fdelete) thereafter.
> 
> Characters on Windows:    " * / : < > ? \ |
> Characters on Linux/OSX:  / \
> 
> I assume this is intentional.

Str_Filename2TOSname() filters characters that are invalid for Atari,
when emulation populates DTA with host names for the Fsnext() call.
I.e. it's for host->atari direction.

For atari->host direction, when emulated program itself specifies file
name e.g. for Fcreate(), only thing done to such file names is clipping
it to 8+3 characters.  Clipping is needed because some Atari programs
give longer strings (e.g. because 8+3 long file names in binary aren't
separated by terminating zeros) and real TOS clips them.

The clipping is done in gemdos.c::clip_to_83().  I guess it could be
moved to str.c, some host character filtering could be added to it,
and the function renamed e.g. Str_Filename2Host().

I think this should be a separate patch.


> But one other behaviour of the GEMDOS emulation is not fully clear to
> me. The GEMDOS layer replaces certain characters by '+' when returned
> by Fsfirst/Fsnext:
> 
> ' * . : ? { } 0x7F
> 
> I don't understand why this is done. As a result those files are not
> shown with their original character in the emulated system
>
> and you get an error when you try to drag the file into another folder
> or into the Trash icon.
>
> I know the GEMDOS layer had a wildcard '?' inserted to catch such cases,
> but this would match on other files too, which is why I commented it out.
> In my view with my patch the replacement of special characters by '+'
> would no longer be needed at all, what do you think?

There can be exactly one '.' in TOS filenames, it's only the rest
of '.' characters which are filtered out.

A lot of things could break if you would pass multiple '.' chars through
Fsnext().  Some Atari programs expect there to be max 3 chars after
first '.', some after last '.'.  This filtering is mainly for files
coming from elsewhere, things like:
  this.is.file.with.lots.of.dots.txt


If you want to remove host->atari filtering for the other characters
and replace '?' matching for '+' INVALID_CHAR with "[.+]" pattern,
I'm fine with that *if* it's tested to work with following:
1. GEM desktop and file selectors in different TOS versions [1],
   whether all files show up fine in them and can be selected
2. several other things using and handling Fsnext() results:
   - replacement file selectors (boxkite, slectric...)
   - programs listing and handling file name patterns:
     GUI copiers, compressors (egale, twoinone, arcshl,
     lzhshell, pacshell, stzip...) etc


	- Eero



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/