Re: [eigen] No vectorization in presence of .cast<T>() calls

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


On 18.12.2010 06:23, Benoit Jacob wrote:
> Tough case.
> 
> Casting from unsigned char to float is expanding 1 byte to 4 bytes,
> which means going from 16 to 4 scalars per 16-byte packet. This change
> in the number of scalars per packet is what's troublesome for our
> vectorization system.
> 
> In general, that's quite hard, but it seems that we can easily
> overcome this in you particular case. Since you're only casting from a
> small type to a bigger type, the expression returned by cast() could
> be vectorizable by implementing packet() by reading LESS THAN a packet
> from the original uchar expression, and expanding it to float.
> 
> something like this (pseudo code):
> 
> packet4f cast<float>(Index i)
> {
>    return packet4f(float(src.coeff(i)), float(src.coeff(i+1)),
> float(src.coeff(i+2)), float(src.coeff(i+3)));
> }

Hm, this would use a slow unvectorized FPU-Cast, I guess ...

> This is only going to be beneficial if this is used in a complex
> enough expression to pay for the cost of this packet() method. We must
> make sure not to introduce a performance regression on a simple
> dst=src.cast<float>() example.

Just thinking loud here ...
First of all, I'd say that casting from a bigger to smaller type should
be possible with the current vectorization system. Something like this:

packet4f some_double_expression::cast<float>(Index i){
	return swizzle(
		_mm_cvtpd_ps(src.packet(i)),
		_mm_cvtpd_ps(src.packet(i+2)),
		/* some index set here */);
}

For the other way around I admit that it's very tricky. Especially, if
the smaller type also comes from a complex expression. Ideally you would
need to read a single package of the smaller type and return 2, 4 or 8
consecutive packages of the bigger type -- I have no idea how this could
be done easily and efficient ...

Another point: Did someone already think about how to support AVX in the
future? -- Last time I checked there were some hard-coded 16 in the code ....

Christoph

-- 
----------------------------------------------
Dipl.-Inf. Christoph Hertzberg
Cartesium 0.051
Universität Bremen
Enrique-Schmidt-Straße 5
28359 Bremen

Tel: (+49) 421-218-64252
----------------------------------------------



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/