|Re: [eigen] New(?) way to make using SIMD easier|
[ Thread Index |
| More lists.tuxfamily.org/eigen Archives
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] New(?) way to make using SIMD easier
- From: Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
- Date: Tue, 24 Nov 2009 11:42:34 -0500
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=H/ls2NCabOIt/mH/2NIPzoL36BE9aJJW+utZgCXHtC8=; b=iq13yC5W2YtLw5dgJKiyOdDGXESMypdaqNDkmrlpFe2F43AT+bPy02QN4f3ToJMlK1 rxQ5OQehcXWzTfkJoK+YE/zz+kV5JJEnhCwGe4Vu8TP2musM+7f+Uh/CnJF9ZOJlECJt +fjNMZ8bBkHzBVW1R6AXm3j4E5uSPj1rOxP2c=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=Om49G/EVGw6UD7lcWtl6p5K9Fb+aOHpvuTfWcMCQlof2e1sqK02dIAs5WpnMLUjYFw 4mIWW/gqVlzrXPRH2C7C56sPqO5wumzgSW2i+6C5MuuqBoiRIvz1X5MvwlBGiJZI9yBe HIezyVRRgkwaiUiCT6n0e+hEEMcf4JJ8XnYCQ=
2009/11/24 Mark Borgerding <mark@xxxxxxxxxxxxxx>:
> Just an idea:
> What if the user could write code like:
> VectorOperator( std::plus<SomeDataType>() , dstPtr , srcPtr1, srcPtr2 ,
> num );
> which would use a SIMD-optimized call if one exists, and use a generic
> algorithm otherwise.
There seem to be 2 aspects in your proposal:
1) providing a uniform interface using functors
---> we already have that: we have little functors encapsulating all
sorts of operations and they provide a packetOp() method that does
just that. See Functors.h. However they don't take care of
loading/storing to memory, see next point (they take pre-loaded
2) providing functions that do load+operation+store instead of
requiring one to call ei_pload and ei_pstore.
---> but i don't think that's a good idea because that means that
complex operations are compiled, a lot of redundant load/store happen.
For example consider what would happen if we did implement operator+
using such a function. Then compiling
would result in basically:
1. load u from memory
2. load v from memory
3. add them, store to memory (temporary)
4. load that temporary again from memory
5. load w from memory
6. add them
here, steps 3-4 are redundant. Using expression templates allows us to
avoid compiling them. This is why we do not use combined functions for
Just fyi here is basically the code that eigen emits for u+v+w,
assuming for example Vector4f (so you also see how these ei_p*
functions are used):
Packet4f pu = ei_pload(u.data());
Packet4f pv = ei_pload(v.data());
Packet4f t = ei_padd(pu,pv);
Packet4f pw = ei_pload(w.data());
Packet4f result = ei_padd(t,pw);
Here this Packet4f type is a typedef for a built-in type (e.g. __m128
on SSE) that the compiler recognizes as a SIMD packet and knows how to
store as a SIMD register (e.g. xmm0).
> It might even be used to detect special conditions. e.g. If CUDA processing
> is enabled and the source pointers are device memory It performs all
> calculations on the device. Then brings the result back to the host only if
> the destination resides in host memory.
We wouldn't want to do any runtime branching in such a small and
frequently-called function ; if one wants to detect these things at
runtime, one needs to do it at a wider level, otherwise too much time
would be wasted in if's.
If these is something that you wanted to do and didn't see how to do
using Eigen's current infrastructure, can we help?