Re: [hatari-devel] Hatari screen test

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


Hi,

On 5/26/20 7:43 PM, Thomas Huth wrote:
Am Tue, 26 May 2020 10:56:01 +0300
schrieb Eero Tamminen <oak@xxxxxxxxxxxxxx>:
On 5/25/20 7:54 PM, Thomas Huth wrote:
Yes, cmd-fifo input is checked in mainloop i.e.
only every VBL, whereas debugger does check on
every instruction.

Your check uses 0.2s sleep i.e. it checks the
register only at 5Hz i.e. every 10th or 12th VBL.

But if I'm wrong and we hit some unexpected test failures here in
the future, sure I'll rework the code in that case.

When I changed breakpoint to check register value
only on VBL change:
breakpoint  VBL ! VBL  &&  a0 = 0xFFFF820A  :trace :file
screenshot.ini

I still got the wrong picture on first VBL.

I.e. what you're doing can still fail, if it
happens to catch first VBL on which that register
changes.

With 50Hz screen updates, probability for that is
10%.

I've run now the test in a loop for 1000 times, and I never got a test
failure, even after decreasing the sleep time from 0.2 to 0.1. Are you
able to get a failure when running the run_test.sh script?

No.  Test always says it passed.

I haven't looked, but maybe there's e.g. one instruction difference when Hatari's internal nVBL variable changes, and when mainloop handles for cmd-fifo input.

If register value is changed on first VBL instruction, then your check might always match the next VBL from the one that debugger does.

I'll check that.


But well, if you feel more confident that way, feel free to change the
script to use the debugger instead the command fifo.

Ok, thanks.

Using debugger test finishes test slightly earlier as it's more exact test, while breakpoint may marginally slow Hatari down. I think it's still a win.


Image comparison might also be slightly shorter
by using ImageMagick "compare" command instead
of "identify" one.

How do you use "compare" for automatic tests? It rather seems useful
for visual inspection of the differences only?

Using return value:
"
         Two images are considered similar if their difference
according to the specified metric and fuzz value is 0, with
         the  exception  of  the  normalized  cross correlation metric
(NCC), where two images are considered similar when
         their normalized cross correlation is 1. The default metric
is NCC.

         The compare program returns 2 on error, 0 if the images are
similar, or a value between 0 and 1 if they  are  not
         similar.
"

That does not work for me - the program always returns the same value,
no matter whether the pictures are the same or not. Do you get sane
return values with "compare" ?

If I do just:
compare -fuzz 0 -metric NCC flixref.png test.png compare.png

Where test.png is copy of flixref.png, I get result of 1, but the compare.png output image
also isn't just white.  I think some other metric
than default NCC needs to be used, but manual
page doesn't list alternatives for them.

I'll look more into this too. I'm sure somebody at work had used compare for this purpose...


In case of EmuTOS, there are slight differences between boots, which
I think come from host-
impacted, small interrupt startup timing
differences.  Those might also affect some tests
outcome, or is instructions flow 100% reproducible
for "--tos none"?

"--tos none" does not enable any interrupts by itself, so there should
not be that much jitter here ... but the full screen test program uses
some interrupts, so at that point in time, there could be some
differences, I think.

Ok, I may investigate that at some point with profiler.


	- Eero




Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/