Re: [hatari-devel] GEMDOS filename handling

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


Hi,

On sunnuntai 10 elokuu 2014, Max Böhm wrote:
> 2014-08-09 23:56 GMT+02:00 Eero Tamminen <oak@xxxxxxxxxxxxxx>:
> Previously I was only testing the GEMDOS HD emulation. When
> I said I was testing with e.g. TOS 1.04 I actually meant that I run
> Hatari with that TOS version but still have used the GEMDOS HD
> drive emulation. So that was misunderstanding.

HD emulation catches the GEMDOS call before it gets from Atari program
to TOS, so the used TOS version shouldn't change how things work, unless
there's GEMDOS HD emulation code to explicitly do that (see below).


>> -> It seems that TOS actually validates/rejects paths.
>>    Hatari GEMDOS HD emulation should do the same.
> 
> that is something you can probably better decide than I can.
> It depends on if you want the GEMDSO HD layer behave exactly
> like TOS or if you see it more like a "network file system" which
> may be allowed to have slightly different filename limitations.

Some Atari programs (like ST-Zip) expect specific error codes
on specific situations without which they don't work correctly.
Such programs may also do different things under different TOS
versions (behavior could even come from compiler C-library),
so errors should match.

Hatari is an emulator, so it should match the emulated system
as closely as possible.  For example Dfree() and Fread() GEMDOS
HD emulation calls check what TOS version is being used and
change their behavior slightly based on that.

 
> > -> I think '\' & '.' don't need to be specifically filtered out,
> > 
> >    they get rejected by last '.' clipping and path not matching.
> 
> yes
> 
> > -> This leaves '*' and '?' as something that needs to be checked
> > 
> >    and paths with them rejected at GEMDOS call level. In case of
> >    TOS <v4, also paths with ' ' in them need to be rejected.
> 
> yes. I think this makes sense. Interestingly those characters are
> allowed in certain TOS version (sometimes only in folder names)
> while prevented in others.
> 
> >> On my system (Ubuntu 14.04 VM) files with those characters
> >> can be created and removed by the emulated system, see attached
> >> logs (linux.zip). Only '\' and '/' do not work.
> > 
> > It's a TOS limitation, not a host one.
> 
> as stated earlier, I was testing with GEMDOS HD emulation.
> 
> >> >> The other alternatives are:
> >> >> - instead of '?', using glob() pattern with '[]', which contains
> >> >> all
> >> >> 
> >> >>   the invalid characters (after your 8-bit char patch, amount of
> >> >>   invalid chars is small enough to be handled like that).
> >> >> 
> >> >> - or first trying with INVALID_CHAR and if that doesn't match, try
> >> >> 
> >> >>   with pattern
> >> 
> >> Not sure if this would work, as for Windows with cp1252 encoding there
> >> are more invalid characters (all the greek characters 0xC0..0xDC, and
> >> some others)
> > 
> > In which way these characters are invalid?
> > 
> > Earlier you stated that only these characters are problematic:
> >    "Characters on Windows:    " * / : < > ? \ |"
> 
> These characters are not allowed in pathnames on Windows (when
> using GEMDOS HD emulation). The errors from the host are returned
> to the emulated system. In Linux you don't have such a limitation,
> there a pathname can contain any characters with exception of '/' and
> '\0'.
> 
> Under Windows my patch does not map the Atari characters to
> Unicode. The characters are just mapped to the ANSI codepage
> which is currently in use. This means not all characters can
> be mapped (only those which exist in the current codepage).
> 
> This is due to a limitation of the Windows standard C runtime library
> which expects paths to be encoded the current codepage in file system
> functions like stat, fopen, mkdir, rmdir, rename, etc).

There's no way to force that to use UTF-8?


> There are also
> unicode versions of those functions with a slightly different name
> which expect filenames to be encoded in UTF16. If I wanted to use
> those, I would have to provide wrappers for all such functions used
> in gemdos.c, which would map filenames from UTF8 to UTF16 and
> invoke the Unicode version of the function under Windows. I thought
> this was a too big change and would result in too much Windows
> specific code. Also the Windows opendir and readdir implementations
> would need some changes. (I had started such an approach, see attached
> file, but then have not continued it due to the aforementioned reasons).

Hm.  Looks pretty ugly.


As a summary of current situation, the use-cases we have are:

1. Atari->host: supporting Atari programs i.e. handling file names
   embedded in Atari apps and being used in different GEMDOS calls
   - assumes Atari files names were converted correctly when
     they were orinally transferred to host
   - needs to do same as TOS:
     A) clip them and
     B) reject ones with invalid chars with correct error,
     and
     C) convert Atari encoding to host

   Current GEMDOS code does A), first part of mail discusses B)
   and your patch does C), but it's not full solution like
   discussed above.

   Q: when transferring files from Atari system to Windows, what
      happens for copied files which have " / : < > |" characters?
      Is there some common mapping which could be used for Windows?

2. host->Atari->host file interoperability i.e. handling GEMDOS calls
   for file names that originated from host through Fsfirst(), and user
   selecting some of those files
   - host file names need to be mapped to TOS accepted names
     and there needs to be some way to match the original
     host file based on the converted name.  Problems are related to:
     A) file name encoding differences
     B) too long file names (clipping and matching clipped names)
     C) host names having characters which either aren't valid
        in GEMDOS file names (e.g. ?), or are
     D) even invalid as Atari characters (weird UTF-8 chars).

    Initial GEMDOS code handled mapping ASCII and cases B) & C),
    although its definition of invalid chars turned out too strict.
    Patch you sent should handle D) & A) for other encodings, but
    I still need to test it before integrating it.

I also need to improve cases 1B) and 2C).


	- Eero

PS. I intended to test & integrate your patch this weekend, but got
nerd-sniped by the SDL GUI keyboard navigation. Sorry about that.



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/