Re: [AD] About Elias' bug |
[ Thread Index |
Date Index
| More lists.liballeg.org/allegro-developers Archives
]
> If the test surrounds a few instructions then I bet that the
> looP is left in Place.
Yes; it turns out it's the loop optimizer fault: it does a very bad job when
playing with the x86 registers, likely due to their scarcity.
for (y=0; y<h; y++) {
s = bmp_read_line(src, s_y+y) + s_x*ssize;
d = bmp_write_line(dest, d_y+y) + d_x*dsize;
if (_color_conv & COLORCONV_DITHER_HI) {
for (x=0; x<w; x++) {
bmp_select(src);
c = bmp_read##sbits(s);
g = (c >> 1);
b = getb##sbits(c);
bmp_select(dest);
bmp_write##dbits(d, makecol##dbits##_dither(r, g, b, x, y));
s += ssize;
d += dsize;
}
}
else {
for (x=0; x<w; x++) {
bmp_select(src);
c = bmp_read##sbits(s);
r = getr##sbits(c);
g = getg##sbits(c);
b = getb##sbits(c);
bmp_select(dest);
bmp_write##dbits(d, makecol##dbits(r, g, b));
s += ssize;
d += dsize;
}
}
}
The COLORCONV_DITHER_HI bit is never set, so the 'else' branch is always
executed.
The unmodified code runs at 43.4 blits/second.
If I remove the test and keep only the 'else' branch, the code runs at 34
blits/second.
Even more amazing, if I only replace 'g = (c >> 1)' with 'g = c', the code
runs at 34 blits/second too.
It looks like the optimizer thinks it absolutely needs a register to perform
'c >> 1' so it reserves %ebx to do it in the 'if' branch. This also frees
%ebx for the 'else' branch and every shift operation is performed inside
registers (%edx for r, %eax for g and %ebx for b) in this branch.
Now, without the 'c >> 1' instruction, %ebx is not reserved in the 'if'
branch hence not available for the 'else' branch. This ends up with shift
operations for b done directly on the stack...
The best solution would be of course to write asm code by hand, but I'm a
little fed up writing asm color conversion code :-)
I'll try instead to gradually change the current C code.
Never trust compilers :-)
--
Eric Botcazou
ebotcazou@xxxxxxxxxx