Re: [hatari-devel] GEMDOS filename handling |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/hatari-devel Archives
]
- To: "hatari-devel@xxxxxxxxxxxxxxxxxxx" <hatari-devel@xxxxxxxxxxxxxxxxxxx>
- Subject: Re: [hatari-devel] GEMDOS filename handling
- From: Max Böhm <mboehm3@xxxxxxxxx>
- Date: Wed, 30 Jul 2014 23:33:55 +0200
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=CLsfAou3n8fWbLidkkxTJZgaJvda7KpDvTorO0ye+54=; b=Q0pLeSNGeKOz+jejyJL/V7yvD1Eemp2Zor8baW4s6UPVx7KRfwzdKKTevKykAPhRyf QWhVwKGiMXoQjX2XkeNAmlk5EybDG+JXMJC2amLx47KacSDw694C/dNwqiTvJaC32gjg SJTxrcs+aVTPUt1xaUGwk5XtMMN1AR83wnjJhR0jRWhw2dK6D6vePyjxxDz0RLC/HbC2 tuHWTnePmx2JlyCR240hgFcN7Nap/42HarUrJx4RvBvbxw5g2VrrkVu9dJ7rcCfQd2qA ILvCJ8iyuNh/eqZeVKCMPaGl2q3BpNBbY1H2gARlOgCfYjPue6C1+l6nAr/s6A1XdQDO kVOg==
Hi Eero,
2014-07-30 18:39 GMT+02:00 Eero Tamminen <oak@xxxxxxxxxxxxxx>:
> Hi,
>
> On lauantai 26 heinäkuu 2014, Max Böhm wrote:
>> I have validated the patch on Windows/Linux/OSX as described below.
>> During the validation I found that still an aspect of the OSX related
>> UTF-8 conversion was incorrect and therefore have updated again my
>> patch (sorry for so many updates, this is hopefully the final version
>> now!):
>>
>> GEMDOS TOS name <-> UTF-8 conversion patch:
>> https://gist.github.com/2610cd9df21dc827fe45
>
> Thanks, the patch looks good except for few trivial things:
> - most of the new functions are used only in str.c so they
> could be static
this is done
> - Str_HostToAtari() call could be after string has been
> clipped to 8+3 chars and replace "strcpy(dst, src);"
This couldn't be done as in UTF-8 encoding the host path may take more
than one byte per character.
Clipping the hostpath on 8+3 bytes may clip too much.
The updated patch is here:
https://gist.github.com/bfc53cd886b204ea22d8
>
> INVALID_CHAR matching change needs to be separate, see bottom of mail.
I have not included the INVALID_CHAR matching change in the patch.
>> Details on the validation
>> -------------------------
>> I've created a test program to be run in the emulated system which
>> first creates one file for each character code 32..255 (using Fcreate).
>>
>> In a second step, after you press a key, it deletes the files (using
>> Fsfirst and Fdelete). It writes a logfile "atari.log" of the form:
>>
>> hexcode - 'filename' - <status of Fcreate>
>> ...
>> hexcode - 'filename' - <status of Fsfirst>, <status of Fdelete>
>>
>> for example:
>> ...
>> 80 - 'L:\TEST\80_Ç.TXT' - created
>> 81 - 'L:\TEST\81_ü.TXT' - created
>> ...
>> 80 - 'L:\TEST\80_Ç.TXT' - found, deleted
>> 81 - 'L:\TEST\81_ü.TXT' - found, deleted
>> ...
>>
>> You find the source code of the test program here:
>> https://gist.github.com/6c7c4340e2a656b2066b
>
> Good, this really needed a test-case.
>
>
>> Then I've run this program on the hg Hatari version (to which I have
>> applied my patch) on the platforms Windows 7, Ubuntu 14.04, and OSX
>> 10.6.
>>
>> The atari.log file created by the test program within the emulated
>> system uses the AtariST character map. To make it readable on the
>> host I've converted it to utf8 using the recode utility:
>>
>> recode AtariST..UTF-8 <atari.log >atari.log-utf8
>>
>> On each platform between the two steps I listed the contents of
>> the TEST directory on the host.
>>
>> Windows: dir /B /O:N TEST >windows.log
>> Linux: ls TEST >linux.log
>> OSX: ls TEST >osx.log
>>
>> The results of the validation for each platform (atari.log and host.log)
>> can be found here:
>>
>> Windows: https://gist.github.com/5b0c1f311860829fb04f
>> Linux: https://gist.github.com/dd23685a872ebd1c544b
>> OSX: https://gist.github.com/bee6f92b11fe695e9430
>>
>> This shows that the character mapping works as expected.
>
> This is good start, but the main point of emulation is to match
> the behavior of emulated system. You need to repeat the same test
> also for different TOS versions [1] on a floppy or HD image.
> Could you do that testing too?
>
> If behaviour differs between TOS versions, we need to think
> whether differences should be emulated or whether we just pick
> the safest behavior.
>
> [1] Latest EmuTOS, TOS 1.0x, TOS 1.6x, TOS 2.0x, TOS 3.x, TOS 4.x.
>
> [2] number of files on FAT file system root dir is limited
> (and max path+file name length is 255), so run test in
> a sub directory.
I've repeated the tests (1st and 2nd) under Linux for the TOS versions
"latest EmuTOS", 1.04, 1.62, 2.06, 3.06, 4.04 using GEMDOS HD emulation.
In addition I did the test (1st and 2nd) on an ACSI harddisk image.
1st Test: create files for all characters and then delete them by
specifying their full name.
2nd Test: create files for all characters and then delete them
manually using the GEM Desktop.
The results are identical for all tested TOS versions. All created
files can be deleted in both cases.
The GEMDOS HD emulation can't create files with '\' or '/' in their name.
The real TOS can't create files with '\' in their name.
The test results for the GEMDOS HD emulation are here:
https://gist.github.com/9f484d68dd76208adb86
The test result for the harddisk image is here:
https://gist.github.com/5b94df54442d7b44bbe3
>> ===================================
>>
>> During the testing I noticed a few other things in the GEMDOS layer:
>>
>> I noticed that the GEMDOS emulation layer can create files on the host
>> with certain characters in the pathname which cannot be found (by
>> Fsfirst) or deleted (by Fdelete) thereafter.
>>
>> Characters on Windows: " * / : < > ? \ |
>> Characters on Linux/OSX: / \
>>
>> I assume this is intentional.
>
> Str_Filename2TOSname() filters characters that are invalid for Atari,
> when emulation populates DTA with host names for the Fsnext() call.
> I.e. it's for host->atari direction.
>
> For atari->host direction, when emulated program itself specifies file
> name e.g. for Fcreate(), only thing done to such file names is clipping
> it to 8+3 characters. Clipping is needed because some Atari programs
> give longer strings (e.g. because 8+3 long file names in binary aren't
> separated by terminating zeros) and real TOS clips them.
>
> The clipping is done in gemdos.c::clip_to_83(). I guess it could be
> moved to str.c, some host character filtering could be added to it,
> and the function renamed e.g. Str_Filename2Host().
>
> I think this should be a separate patch.
>
>
>> But one other behaviour of the GEMDOS emulation is not fully clear to
>> me. The GEMDOS layer replaces certain characters by '+' when returned
>> by Fsfirst/Fsnext:
>>
>> ' * . : ? { } 0x7F
>>
>> I don't understand why this is done. As a result those files are not
>> shown with their original character in the emulated system
>>
>> and you get an error when you try to drag the file into another folder
>> or into the Trash icon.
>>
>> I know the GEMDOS layer had a wildcard '?' inserted to catch such cases,
>> but this would match on other files too, which is why I commented it out..
>> In my view with my patch the replacement of special characters by '+'
>> would no longer be needed at all, what do you think?
>
> There can be exactly one '.' in TOS filenames, it's only the rest
> of '.' characters which are filtered out.
>
> A lot of things could break if you would pass multiple '.' chars through
> Fsnext(). Some Atari programs expect there to be max 3 chars after
> first '.', some after last '.'. This filtering is mainly for files
> coming from elsewhere, things like:
> this.is.file.with.lots.of.dots.txt
>
>
> If you want to remove host->atari filtering for the other characters
> and replace '?' matching for '+' INVALID_CHAR with "[.+]" pattern,
> I'm fine with that *if* it's tested to work with following:
> 1. GEM desktop and file selectors in different TOS versions [1],
> whether all files show up fine in them and can be selected
> 2. several other things using and handling Fsnext() results:
> - replacement file selectors (boxkite, slectric...)
> - programs listing and handling file name patterns:
> GUI copiers, compressors (egale, twoinone, arcshl,
> lzhshell, pacshell, stzip...) etc
I currently don't have the time to go deeper into this topic and all
these required tests...
For me the main question is what to do with the files which receive
the '+' replacement
character in Fsfirst/Fsnext. Shall such returned paths be found on the host like
it currently is (by replacing th '+' by a wildcard '?' in the GEMDOS HD layer).
Then they can be deleted but the '?' wildcard can also match on other files
and delete them instead.
Or shall they not be deletable unless their full name is specified
without wildcards
or replacement character? This is what I originally had in my patch,
but now have
removed it as you prefered to keep this separate. I think it may need
more discussion.
I found something else which may need attention: When you create a file
"atari.log" through the GEMDOC HD layer on the host and then copy another file
"atari.lo" into the same directory, the two files seem to be mapped to the same
host file by the GEMDOS layer. The emulated system reports that "atari.lo"
already exists although this should not be the case.
Max