Re: [hatari-devel] Hatari screen test

[ Thread Index | Date Index | More Archives ]

Am Tue, 26 May 2020 10:56:01 +0300
schrieb Eero Tamminen <oak@xxxxxxxxxxxxxx>:

> Hi,
> On 5/25/20 7:54 PM, Thomas Huth wrote:
> > Am Mon, 25 May 2020 17:32:55 +0300
> > schrieb Eero Tamminen <oak@xxxxxxxxxxxxxx>:  
> >> NOTE: I tested the 0xFFFF820A register value
> >> with a breakpoint, and that can be there much
> >> before screen looks correct. Therefore I think
> >> something like VBL wait would be more robust.  
> > 
> > I thought about that, too, but I think the current code should be
> > relatively safe: As far as I can see, the function that checks the
> > command fifo is only polled approx. once per VBL.
> > So between the
> > "hatari-debug r" and the "hatari-shortcut screenshot", there should
> > be at least one VBL, leaving enough time to render a proper
> > picture.  
> Yes, cmd-fifo input is checked in mainloop i.e.
> only every VBL, whereas debugger does check on
> every instruction.
> Your check uses 0.2s sleep i.e. it checks the
> register only at 5Hz i.e. every 10th or 12th VBL.
> > But if I'm wrong and we hit some unexpected test failures here in
> > the future, sure I'll rework the code in that case.  
> When I changed breakpoint to check register value
> only on VBL change:
> breakpoint  VBL ! VBL  &&  a0 = 0xFFFF820A  :trace :file
> screenshot.ini
> I still got the wrong picture on first VBL.
> I.e. what you're doing can still fail, if it
> happens to catch first VBL on which that register
> changes.
> With 50Hz screen updates, probability for that is
> 10%.

I've run now the test in a loop for 1000 times, and I never got a test
failure, even after decreasing the sleep time from 0.2 to 0.1. Are you
able to get a failure when running the script?

But well, if you feel more confident that way, feel free to change the
script to use the debugger instead the command fifo.

> >> Image comparison might also be slightly shorter
> >> by using ImageMagick "compare" command instead
> >> of "identify" one.  
> > 
> > How do you use "compare" for automatic tests? It rather seems useful
> > for visual inspection of the differences only?  
> Using return value:
> "
>         Two images are considered similar if their difference
> according to the specified metric and fuzz value is 0, with
>         the  exception  of  the  normalized  cross correlation metric 
> (NCC), where two images are considered similar when
>         their normalized cross correlation is 1. The default metric
> is NCC.
>         The compare program returns 2 on error, 0 if the images are 
> similar, or a value between 0 and 1 if they  are  not
>         similar.
> "

That does not work for me - the program always returns the same value,
no matter whether the pictures are the same or not. Do you get sane
return values with "compare" ?

> In case of EmuTOS, there are slight differences between boots, which
> I think come from host-
> impacted, small interrupt startup timing
> differences.  Those might also affect some tests
> outcome, or is instructions flow 100% reproducible
> for "--tos none"?

"--tos none" does not enable any interrupts by itself, so there should
not be that much jitter here ... but the full screen test program uses
some interrupts, so at that point in time, there could be some
differences, I think.


Mail converted by MHonArc 2.6.19+