Re: [eigen] std::complex vectorization braindump |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] std::complex vectorization braindump
- From: Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
- Date: Wed, 13 Jan 2010 22:08:09 -0500
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=DCD/abSCW6GDvr5y0GdSVF3hn9rS2q/xRoJX8THr138=; b=hJ+VzuM34WIV0XQEjkyVyI7mpnR4x5hXM0t2dAkPhc+Ivm+Z/OBwPOQHpvV2zNRaT4 VhH/nGEzsbwsA+rCv1I9S4CTvZCOs9dOdk2AfFIYbGvE0HMDQxXh270Nhq0BpQ61Jnx8 DX8XBB2y/CjzYRlMso/CpJ1yLAlMb8igTA4tE=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=roPTJ6Ktk2a031ML/10qOnboGvPwhUvPqrpjntn9NAgmdlzurbOLXIMZpu6NYOTX6D dOjrbhpF8RxQxkqvvR637s4qC9YJrqHLqShNL6wEjNDxyUMKYRTiX7CoE/j60BAVIZRp 2laLll/5Tvve6KP9Zl8xNSNPr48b3kCUi/Ub8=
2010/1/13 Mark Borgerding <mark@xxxxxxxxxxxxxx>:
> Benoit Jacob wrote:
>>
>> [snip]
>>
>> *** General problems ***
>>
>> vectorizing std::complex breaks at least 2 assumptions that Eigen is
>> currently making:
>> 1) that vectorized paths can assume real numbers
>> ---> For example in Dot.h the vectorized paths don't bother conjugating
>> 2) that PacketSize==1 means no vectorization
>> ---> This breaks for complex<double>
>>
>
> [snip]
>>
>> 5) The puzzle: What to do about ei_pabs() ? In the same vein it would
>> be nice to introduce a ei_pabs2()... but we need to solve the
>> question: what should they return, a half of a packet of reals???
>>
>>
>
> Is a packet limited to 128 bits long?
Currently yes, though this is bound to change the day some new SIMD
instruction set comes out....
> If longer is possible, this may have some advantages:
> 1. PacketSize remains > 1 ( minor benefit to avoid short-term code scrub)
> 2. Unrolls some loops. (see below)
> 3. The complex-to-real problem (e.g. abs2) goes away
>
> I've found a certain amount of loop unrolling very beneficial to speed, even
> with SIMD.
> e.g. loading 4 SIMD registers, working on them, then storing the results can
> be much faster than doing them one at a time.
I see. This kind of unrolling is what we called peeling and I can
believe that in some cases it brings benefits. Normally I would like
to keep peeling completely orthogonal to vectorization, as these are
two unrelated topics. But I have to admit that I don't have any
reasonable solution, at the moment, to the complex-to-real problem,
so, I have to be open to all possibilities! One should just keep in
mind that there always is the brutal option of not vectorizing abs and
abs2 for complex numbers: after all, this isn't the performance
critical stuff in most cases, for example this isn't used in the
matrix products which is where all the matrix algorithms spend all
their time (once they're optimized). So, just vectorizing
add/mul/mulconj will already be very nice.
Benoit
>
>
> -- Mark
>
>
>
>