RE: [AD] Faster hsv_to_rgb()

[ Thread Index | Date Index | More lists.liballeg.org/allegro-developers Archives ]


>> One nice thing about 0.8 fixed-point numbers is that I can make a 64K
>> multiply lookup table for all values from 00-FF (they're one byte each).
>
>
>Are you sure this would speed-up things? Cache misses are far far more
>expensive than the latency for multiplies.


I tried replacing the lookup table macro
#define cpMUL8(a,b) (G_caMultTab[((a)<<8)+(b)])
with the following 0.8 multiply macro
#define cpMUL8(a,b) (((a)*(b))>>8)

Looks like you're right. When I replaced the lookup table macro with the
multiply-macro, I got a small speed increase. I've not written a program
that only tests the speed of the multiply or hsv_to_rgb(), I just used it in
the program I was working on that pre-calculates the HSV values of an
image's pixels and does a hsv_to_rgb() for each pixel of the image (after an
effect has been applied) per iteration.

I am using a 600MhZ AMD Athlon with a 64K data-cache, 64K Instruction-cache,
512K L2 Cache and 256MB of 100MhZ Ram (CAS 2 latency). I just wonder how
these results will compare on more modern processors with greater L1/L2
cache or older processors where multiply is slower and with smaller caches
so that there will be more things likely to cause cache-misses, or for that
matter, non x86-based processors.

However, I slightly tweaked the lookup table so that 0.FF * 0.FF = 0.FF
instead of 0.FE (as the macro returns), so the results will use the full
range of 00-FF. This would make 0.FF equivalent to 1.00 as far as using
numbers in the range 0-1 goes. Here is the code I used to generate the
lookup table.


unsigned char G_caMultTab[65536];

void
cpPreCalcMultTab()
{
	int x,y;
	for(y=0;y<256;y++)
	{
		for(x=0;x<256;x++)
		{
			float nTemp = ((y/255.0f) * (x/255.0f)) * 255.0f;
G_caMultTab[(y<<8)+x] = (unsigned char)(nTemp + 0.5);
		}
	}
}




>> Do you think Allegro has a need for seperate faster but less accurate
>> RGB<->HSV functions that use a different paramater-space, or
>should it be up
>> to users to add their own versions of these functions?
>
>Perhaps best left as an add-on.

I think you're rignt. Once I release my project, I will release the source
code for it.


>but my API is very interested in being broken.. in the name of
speed/optimz.
>
>please may i have the code..
>and/or any documenation, or a test program.

See above


>can you see any optimzations that might be done on the rgb_to_hsv that do
>NOT break the api ?

See Sven's code for such optimisations.


>I tested it (ran through many h,s,v triples and compared with previous
>algorithm), and it seems correct. It's a bit faster, but not as much as
>one would expect: on my computer the new version takes about 95% of the
>time that the old version took.

I think the main speed bottleneck is converting the paramaters into the
format the equasion needs. When I was writing my code, I gave hsv_to_rgb()
as much optimisation as I could give it, as I use that as part of the
calculation for each pixel per frame (hence why I didn't give rgb_to_hsv()
as much attention). This mainly involved changing the format of the
paramaters to 0.8 fixed-point and returning the RGB value as a long instead
of passing in pointers to R,G,B. While optimising, I also noticed that by
re-arranging the expressions, I could shave off a few multiplies (which is
the optimisation I submitted), but that optimisation is overshadowed by a
divide and a multiply used to get the H and V paramaters in range, which is
why there isn't much of a speed increase.

Comparing the two versions at this time would create an additional slow-down
as I would have to convert the paramaters back into the old format, and
re-writing the code to use the old-format paramaters just to test the speed
difference would take too much time.



AE.





Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/