Re: [eigen] vectorization of complex

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


2010/7/7 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
> Hi,
>
> I've just merged the vectorization of complex<float> and
> complex<double> into the devel branch. They also include some
> optimizations for SSE3. All tests pass with gcc 4.4 in 32 and 64 bits.
> For the rest, well, let's see ;)

Make sure to update the vectorization_logic test to test complex as
much as real types are currently tested,

(sorry because of the mozilla summit i have little time :( so i didn't
read through your changes)

>
> The support for mixing types will continue on this fork.
>
> On Wed, Jul 7, 2010 at 12:14 AM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
>> Awesome!
>>
>> Have you found back my old email?
>> http://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2010/01/msg00096.html
>
> ah thanks I forgot about this one, so let's see:
>
>> 1) Everywhere in Eigen when we have PacketSize==1 conditions, examine
>> if we really mean that or if that is a way of asking if the scalar
>> type has vectorization. Hint: in 95% of cases it is the latter. An
>> exception might be in ei_first_aligned, need to check.
>
> actually, there were only 2 or 3 occurrences...

ok, sounds plausible. I guess there was in particular a big one about
dot product.

>> 3) Introduce new SIMD functions ei_pconj (conjugate a packet) and
>> ei_pmulconj (compute x*conj(y), useful in dot products etc.).
>> For real numbers, ei_pconj(x) returns x and ei_pmulconj is just like
>> ei_pmul. Implement them for complex<float> and complex<double>. At
>> first, do it only for SSE using instructions like SHUFPS, then we'll
>> see if stuff can factor out with AltiVec...
>
> This is achieved via a more general ei_conj_helper<T0,T1,bool
> Conj0,bool Conj1> object allowing all conjugation configurations as
> well as mixing real and complexes (in the future).
>
> Some details:
>
> SSE3 proposes a nice instruction called addsub which is perfect to
> compute the product of complexes. Using this instruction yields a
> significant speedup. So far so good. The problem is that this
> instruction is useless for the conjugated multiplications. If you want
> to use it without explicitly conjugating the arguments, you need one
> more shuffling killing the performance. So with SSE3 it is actually
> faster to simply do ei_pmul(ei_pconj(a),b).... For matrix-matrix
> products, it is even faster to let the conjugations happen during the
> packing of the blocks, such that we always do basic multiplications
> ei_pmul(a,b).
>
> Still about matrix products, higher performance could be achieved by
> explicitly writing the code to perform multiple multiplications.
> Indeed, each factor is used twice, and some intermediate results
> (shuffling) could be reused.... Maybe the compiler does that for us, I
> did not check, but I doubt that!

OK, thanks a lot Gael for investigating all this!

Isn't it amazing that with our infrastructure, you could just add
vectorization support for complex and get your super fast matrix
product and all our algorithms using it. I can't wait to see benchmark
results: I wouldn't be surprised if we (i.e. you) beat some of the big
guys on complex matrices!

>
>> 5) The puzzle: What to do about ei_pabs() ? In the same vein it would
>> be nice to introduce a ei_pabs2()... but we need to solve the
>> question: what should they return, a half of a packet of reals???
>
> Yes that's still an open issue !!!! And there also exist some cases
> where we would like to pick two consecutive floats (a1, a2), put them
> in a 4 component packet (a1 a1 a2 a2) to then multiply it to a packet
> of two complex<float> .... This occurs for coeff wise products, some
> configurations of the diagonal product, etc.

ok. well, even without any abs() vectorization, we already got matrix
products, which is the most important part.

Benoit

>
> cheers,
> gael
>
>
>>
>> Not that you need it :-) It just has a couple of small ideas that
>> might still be relevant.
>>
>> Benoit
>>
>> 2010/7/6 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>>> Hi all,
>>>
>>> everything is in the title, and this is happening there:
>>>
>>> http://bitbucket.org/ggael/eigen-complex
>>>
>>> complex<float> are already vectorized: speedup factor 4.6x compared to
>>> beta1 for a large matrix product :)
>>>
>>> road-map:
>>>
>>> 1 - complex<double>
>>> 2 - mixed real-complex products
>>> 3 - merge
>>>
>>> 4 - optimized implementation for SSE3 and SSE4 (can be done in
>>> parallel to the rest)
>>>
>>> Please let me handle items 1 and 2 because they might require some non
>>> trivial changes deep inside Eigen, but if some want to have fun
>>> playing with SSE intrinsics your are very welcome to help with item 4.
>>> Everything is in Eigen/src/Core/arch/SSE/Complex.h.
>>>
>>>
>>> cheers,
>>> gael.
>>>
>>>
>>>
>>
>>
>>
>
>
>



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/