Re: [eigen] New(?) way to make using SIMD easier |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] New(?) way to make using SIMD easier
- From: Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
- Date: Tue, 24 Nov 2009 11:51:21 -0500
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=yU+YrKyvJOyesoMUM0WPOdYwb1utuNHmqDAiexcfZxQ=; b=cdOITdDdWJJ/65xKE4L7iIZ9nhcKyAcVL7GVyWZuNacLiHW0XXUti/pQJfBD+9Zms0 FTHUoOkkKeb1FzLLMMYJ6iTuf4OlEEqXlFTR1O+1XkBVrtbSoOqS+FzulPQeuB3R75BV wRTvZm2lwL/5LQFC1Te42WdzB9MrUjIjnsqvY=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=haFaAQmR/a6A0/L9ekyOseONjkfrPIma07UJgqaRwpyxU3MWybwgl59wBNhipJNNFj yQEcDZ0IJS1vKyOCyD086gQP+aBF0Q6WLOQWkR3BPPofbMgb1A8fqbhio63o/ksZwIo2 5KjqKcm7ApDEm2x2QFv9g66YmvPFN+aAS75qI=
ah and also.
if you just want a generic easy-to-use way of performing a SIMD
operation on arrays in memory... then you can do even much simpler:
just use Map and do your operation on that. like:
VectorXf::Map(dstPtr,num)
= VectorXf::Map(srcPtr1,num)
+ VectorXf::Map(srcPtr2,num);
that compiles to just what you wanted. well except that it adds some
code to deal with unaligned boundaries; but if 'num' is known at
compile time then you avoid that by using Matrix<float,num,1> instead
of VectorXf.
If you want a generic operation instead of '+', see unaryExpr().
Benoit
2009/11/24 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
> 2009/11/24 Mark Borgerding <mark@xxxxxxxxxxxxxx>:
>> Just an idea:
>> What if the user could write code like:
>>
>> VectorOperator( std::plus<SomeDataType>() , dstPtr , srcPtr1, srcPtr2 ,
>> num );
>>
>> which would use a SIMD-optimized call if one exists, and use a generic
>> algorithm otherwise.
>
> There seem to be 2 aspects in your proposal:
>
> 1) providing a uniform interface using functors
>
> ---> we already have that: we have little functors encapsulating all
> sorts of operations and they provide a packetOp() method that does
> just that. See Functors.h. However they don't take care of
> loading/storing to memory, see next point (they take pre-loaded
> packets/registers).
>
> 2) providing functions that do load+operation+store instead of
> requiring one to call ei_pload and ei_pstore.
>
> ---> but i don't think that's a good idea because that means that
> complex operations are compiled, a lot of redundant load/store happen.
> For example consider what would happen if we did implement operator+
> using such a function. Then compiling
> u+v+w
> would result in basically:
>
> 1. load u from memory
> 2. load v from memory
> 3. add them, store to memory (temporary)
> 4. load that temporary again from memory
> 5. load w from memory
> 6. add them
> 7. store
>
> here, steps 3-4 are redundant. Using expression templates allows us to
> avoid compiling them. This is why we do not use combined functions for
> load+op+store.
>
> Just fyi here is basically the code that eigen emits for u+v+w,
> assuming for example Vector4f (so you also see how these ei_p*
> functions are used):
>
> Packet4f pu = ei_pload(u.data());
> Packet4f pv = ei_pload(v.data());
> Packet4f t = ei_padd(pu,pv);
> Packet4f pw = ei_pload(w.data());
> Packet4f result = ei_padd(t,pw);
>
> Here this Packet4f type is a typedef for a built-in type (e.g. __m128
> on SSE) that the compiler recognizes as a SIMD packet and knows how to
> store as a SIMD register (e.g. xmm0).
>
> See Core/arch/SSE/PacketMath.h
>
>> It might even be used to detect special conditions. e.g. If CUDA processing
>> is enabled and the source pointers are device memory It performs all
>> calculations on the device. Then brings the result back to the host only if
>> the destination resides in host memory.
>
> We wouldn't want to do any runtime branching in such a small and
> frequently-called function ; if one wants to detect these things at
> runtime, one needs to do it at a wider level, otherwise too much time
> would be wasted in if's.
>
> If these is something that you wanted to do and didn't see how to do
> using Eigen's current infrastructure, can we help?
>
> Cheers,
> Benoit
>