Re: [hatari-devel] GEMDOS filename handling

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


Hi,

2014-08-09 23:56 GMT+02:00 Eero Tamminen <oak@xxxxxxxxxxxxxx>:
>> > After improving the test (attached), and testing it with
>> > TOS 2.x (results attached), I found out that even native
>> > TOS has problems with files & directories having following
>> > characters in them: ' ', '*', '.', '?', '\'.
>
> I've now tested almost all TOS versions:
> - Except for TOS v4, all real TOS versions behaved like stated
>   above.  TOS v4 additionally accepts ' ' in file names.

yes, I get similar results when using a floppy disk image.

>> Can it be that the host system does not allow these characters?
>
> This was testing with plain TOS, on an emulated floppy image.
> There was no GEMDOS HD emulation or host files involved.
>
> Are you sure your earlier TOS testing worked properly
> (your test program referred drive L: in the paths, maybe
> that was mapped to GEMDOS drive in your hatari.cfg)?

Previously I was only testing the GEMDOS HD emulation. When
I said I was testing with e.g. TOS 1.04 I actually meant that I run
Hatari with that TOS version but still have used the GEMDOS HD
drive emulation. So that was misunderstanding.

> -> It seems that TOS actually validates/rejects paths.
>    Hatari GEMDOS HD emulation should do the same.

that is something you can probably better decide than I can.
It depends on if you want the GEMDSO HD layer behave exactly
like TOS or if you see it more like a "network file system" which
may be allowed to have slightly different filename limitations.

> -> I think '\' & '.' don't need to be specifically filtered out,
>    they get rejected by last '.' clipping and path not matching.

yes

> -> This leaves '*' and '?' as something that needs to be checked
>    and paths with them rejected at GEMDOS call level. In case of
>    TOS <v4, also paths with ' ' in them need to be rejected.

yes. I think this makes sense. Interestingly those characters are
allowed in certain TOS version (sometimes only in folder names)
while prevented in others.

>> On my system (Ubuntu 14.04 VM) files with those characters
>> can be created and removed by the emulated system, see attached
>> logs (linux.zip). Only '\' and '/' do not work.
>
> It's a TOS limitation, not a host one.

as stated earlier, I was testing with GEMDOS HD emulation.

>> >> The other alternatives are:
>> >> - instead of '?', using glob() pattern with '[]', which contains all
>> >>
>> >>   the invalid characters (after your 8-bit char patch, amount of
>> >>   invalid chars is small enough to be handled like that).
>> >>
>> >> - or first trying with INVALID_CHAR and if that doesn't match, try
>> >>
>> >>   with pattern
>>
>> Not sure if this would work, as for Windows with cp1252 encoding there
>> are more invalid characters (all the greek characters 0xC0..0xDC, and
>> some others)
>
> In which way these characters are invalid?
>
> Earlier you stated that only these characters are problematic:
>    "Characters on Windows:    " * / : < > ? \ |"

These characters are not allowed in pathnames on Windows (when
using GEMDOS HD emulation). The errors from the host are returned
to the emulated system. In Linux you don't have such a limitation,
there a pathname can contain any characters with exception of '/' and
'\0'.

Under Windows my patch does not map the Atari characters to
Unicode. The characters are just mapped to the ANSI codepage
which is currently in use. This means not all characters can
be mapped (only those which exist in the current codepage).

This is due to a limitation of the Windows standard C runtime library
which expects paths to be encoded the current codepage in file system
functions like stat, fopen, mkdir, rmdir, rename, etc). There are also
unicode versions of those functions with a slightly different name
which expect filenames to be encoded in UTF16. If I wanted to use
those, I would have to provide wrappers for all such functions used
in gemdos.c, which would map filenames from UTF8 to UTF16 and
invoke the Unicode version of the function under Windows. I thought
this was a too big change and would result in too much Windows
specific code. Also the Windows opendir and readdir implementations
would need some changes. (I had started such an approach, see attached file,
but then have not continued it due to the aforementioned reasons).

- Max
#ifdef WIN32
/* On the Windows platform the standard filesystem functions use the
 * Windows code page for pathnames. Internally NTFS stores all pathnames
 * in Unicode.
 * The code below provides a thin wrapper layer for Windows for the
 * filesystem functions used by gemdos.c which accept utf-8 encoded
 * pathnames. This makes the API similar to Linux and OSX.
 * Using utf-8 on the host allows to represent all AtariST characters
 * independently of the locale.
 */

#define stat  utf8_stat
#define fopen utf8_fopen
#define utime utf8_utime
#define access utf8_access
#define mkdir utf8_mkdir
#define rmdir utf8_rmdir
#define chmod utf8_chmod
#define unlink utf8_unlink
#define rename utf8_rename



int utf8_stat(const char *path, struct _stat *buffer)
{
	wchar_t wpath[MAX_PATH];
	utf8ToUtf16(path, wpath, MAX_PATH);
	return _wstat(wpath, buffer);
}

FILE *utf8_fopen(const char *filename, const char *mode)
{
	wchar_t wfilename[MAX_PATH];
	wchar_t wmode[10];
	utf8ToUtf16(filename, wfilename, MAX_PATH);
	utf8ToUtf16(mode, wmode, 10);
	return _wfopen(wfilename, wmode);
}

int utf8_utime(const char *filename, struct _utimbuf *times)
{
	wchar_t wfilename[MAX_PATH];
	utf8ToUtf16(filename, wfilename, MAX_PATH);
	return _wutime(wfilename, times);
}

int utf8_access(const char *path, int mode)
{
	wchar_t wpath[MAX_PATH];
	utf8ToUtf16(path, wpath, MAX_PATH);
	return _waccess(wpath, mode);
}

/* TODO: opendir, readdir, scandir */

int utf8_mkdir(const char *dirname)
{
	wchar_t wdirname[MAX_PATH];
	utf8ToUtf16(dirname, wdirname, MAX_PATH);
	return _wmkdir(wdirname);
}

int utf8_rmdir(const char *dirname)
{
	wchar_t wdirname[MAX_PATH];
	utf8ToUtf16(dirname, wdirname, MAX_PATH);
	return _wrmdir(wdirname);
}

int utf8_chmod(const char *filename, int pmode)
{
	wchar_t wfilename[MAX_PATH];
	utf8ToUtf16(filename, wfilename, MAX_PATH);
	return _wchmod(wfilename, pmode);
}

int utf8_unlink(const char *filename)
{
	wchar_t wfilename[MAX_PATH];
	utf8ToUtf16(filename, wfilename, MAX_PATH);
	return _wunlink(wfilename);
}

int utf8_rename(const char *oldname, const char *newname)
{
	wchar_t woldname[MAX_PATH];
	wchar_t wnewname[MAX_PATH];
	utf8ToUtf16(oldname, woldname, MAX_PATH);
	utf8ToUtf16(newname, wnewname, MAX_PATH);
	return _wrename(woldname, wnewname);
}

#endif /* WIN32 */


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/