Re: [eigen] New(?) way to make using SIMD easier

[ Thread Index | Date Index | More Archives ]

ah and also.

if you just want a generic easy-to-use way of performing a SIMD
operation on arrays in memory... then you can do even much simpler:
just use Map and do your operation on that. like:

  = VectorXf::Map(srcPtr1,num)
  + VectorXf::Map(srcPtr2,num);

that compiles to just what you wanted. well except that it adds some
code to deal with unaligned boundaries; but if 'num' is known at
compile time then you avoid that by using Matrix<float,num,1> instead
of VectorXf.

If you want a generic operation instead of '+', see unaryExpr().


2009/11/24 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
> 2009/11/24 Mark Borgerding <mark@xxxxxxxxxxxxxx>:
>> Just an idea:
>> What if the user could write code like:
>>   VectorOperator( std::plus<SomeDataType>() , dstPtr , srcPtr1, srcPtr2 ,
>> num );
>> which would use a SIMD-optimized call if one exists, and use a generic
>> algorithm otherwise.
> There seem to be 2 aspects in your proposal:
>  1) providing a uniform interface using functors
>  ---> we already have that: we have little functors encapsulating all
> sorts of operations and they provide a packetOp() method that does
> just that. See Functors.h. However they don't take care of
> loading/storing to memory, see next point (they take pre-loaded
> packets/registers).
> 2) providing functions that do load+operation+store instead of
> requiring one to call ei_pload and ei_pstore.
>  ---> but i don't think that's a good idea because that means that
> complex operations are compiled, a lot of redundant load/store happen.
> For example consider what would happen if we did implement operator+
> using such a function. Then compiling
>   u+v+w
> would result in basically:
> 1. load u from memory
> 2. load v from memory
> 3. add them, store to memory (temporary)
> 4. load that temporary again from memory
> 5. load w from memory
> 6. add them
> 7. store
> here, steps 3-4 are redundant. Using expression templates allows us to
> avoid compiling them. This is why we do not use combined functions for
> load+op+store.
> Just fyi here is basically the code that eigen emits for u+v+w,
> assuming for example Vector4f (so you also see how these ei_p*
> functions are used):
> Packet4f pu = ei_pload(;
> Packet4f pv = ei_pload(;
> Packet4f t = ei_padd(pu,pv);
> Packet4f pw = ei_pload(;
> Packet4f result = ei_padd(t,pw);
> Here this Packet4f type is a typedef for a built-in type (e.g. __m128
> on SSE) that the compiler recognizes as a SIMD packet and knows how to
> store as a SIMD register (e.g. xmm0).
> See Core/arch/SSE/PacketMath.h
>> It might even be used to detect special conditions.  e.g. If CUDA processing
>> is enabled and the source pointers are device memory  It performs all
>> calculations on the device.  Then brings the result back to the host only if
>> the destination resides in host memory.
> We wouldn't want to do any runtime branching in such a small and
> frequently-called function ; if one wants to detect these things at
> runtime, one needs to do it at a wider level, otherwise too much time
> would be wasted in if's.
> If these is something that you wanted to do and didn't see how to do
> using Eigen's current infrastructure, can we help?
> Cheers,
> Benoit

Mail converted by MHonArc 2.6.19+