Re: [AD] fixed point `fsqrt' and `fhypot'

[ Thread Index | Date Index | More lists.liballeg.org/allegro-developers Archives ]


Felix Kuehling <felixyz@xxxxxxxxxx> writes:
> When I tried to optimize the particle effect engine of a freeware game 
> that I'm currently working on, I found out that the `fsqrt' routine is 
> extremely slow. Here's my version which does excactly the same and 
> works 15 times faster at my P133 (it uses a `bsr' instruction instead 
> of a series of comparisons to reduce the range of the input number):

I don't get such a dramatic improvement on my PII 450 (12%), but that's 
still certainly faster! Congratulations.

> " movzwl %3(,%0,2), %0;" /* table lookup... */
[...]
> "m" (sqrt_table[0])    /* %3 = address of lookup table */

This doesn't work when building as a shared library on Linux, because 
sqrt_table isn't a constant. It can be fixed by changing to:

   " movzwl (%3,%0,2), %0 ; "
[...]
   "r" (sqrt_table)

The same problem occurs in a more serious form with your fhypot() 
function: you cannot reserve %ebx because this is used for shared library 
relocation in Linux ELF builds. Unfortunately the register usage doesn't 
allow a trivial juggling to move this variable into a safe register (%esi 
and %edi are potentially free), but I think it can be made to work if 
the current %ebx is replaced by %eax, and the current %eax moved into 
%esi (that requires an extra copy of the return value at the end when used 
as a shared lib, but I think it's unavoidable).

I haven't actually made this change yet, though: I'll do it when I get 
time, if you don't get there first.

> My game uses the square root for calculating 
> `sqrt(fmul(x,x)+fmul(y,y))', the Pythagoras formula. Unfortunately 
> `fmul(x,x)' will overflow if x > 256. So I always had to use the formula 
> `sqrt(fmul(x>>4,y>>4),fmul(y>>4,y>>4))<<4' to be able to work with 
> numbers till 4096. Now I implemented an ASM routine which does this for 
> fixed point numbers in any range by working with 64 bit intermediate 
> results. In analogy to the libc routine `hypot' I named it `fhypot':

Nice idea. I agree, this formula is used frequently enough to be well 
worth optimising. Many thanks for this!

One observation, though: did you try using floating point code in your 
game? On Pentium and above it can often be faster than fixed point 
calculations, and the more recent your processor becomes, the better it 
performs in comparison. Especially for such heavy math stuff, I'd be 
tempted to give it a go...


-- 
Shawn Hargreaves - shawn@xxxxxxxxxx - http://www.talula.demon.co.uk/
"A binary is barely software: it's more like hardware on a floppy disk."



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/