Re: [eigen] AVX/LRB and SSE with SoA support |

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

*To*: eigen@xxxxxxxxxxxxxxxxxxx*Subject*: Re: [eigen] AVX/LRB and SSE with SoA support*From*: keller <christoph-keller@xxxxxx>*Date*: Fri, 06 Aug 2010 19:11:53 +0200

Hello,

I can make a patch using Mercurial and write some test functions, if needed. Greetings, Christoph On 07/23/2010 01:19 PM, Gael Guennebaud wrote:

Hi, I've attached a modified version of your code for Eigen3 as well as the corresponding ASM generated with gcc 4.5. On Fri, Jul 23, 2010 at 6:45 AM, keller<christoph-keller@xxxxxx> wrote:Hello, I implemented a function where a sphere cuts a Ray Bundle (Most problems can be demonstrated there). It may not be optimally implemented, but i think it works as a simple example. I did not yet put the data into blocks, as a few other problems occurred. I used eigen2 for this. -using VS2008 and the Intel compiler it compiles and runs correctly. -g++ prints out errors with cwise()Yes this is because when you call a template function on a template type you have to use the template keyword, e.g.: mat.template block<rows,cols>(i,j)The resulting code does not use SSE-Instructions that often. I do not know if this is because i used eigen2 in the wrong way. The compiler(VS2008) has SSE-Instructions enabled. Also the _mm_or_ps is avaiable through ei_por in Eigen3, but not via a normal interface function.This is something that would make sense to have upstream in the Array API. The question is shall we use operator| (i.e., A | B) or a very explicit A.bitwiseOr(B) function. Since operator| is not defined for non integral type I would go for the explicit bitwiseOr._mm_movemask_ps is not avaiable at all.This one is more difficult to expose in a useful and portable way. Perhaps something like: ei_pallpositive(v) / ei_pallnegative(v) for the low level functions, and perhaps in second time we could have shortcuts for: (A> 0).all() and (A< 0).all() like A.allPositive() / A.allNegative() which would be built on top of the respective ei_p* functions, but I'm not 100% sure and this is really specific use cases. I doubt that's really critical even in your case. gaelIs it a good idea at all to use Eigen, if one does not want to be more than about 20% slower than with a naive Intrinsics-implementation?For this kind of low level stuff you can attack Eigen at two level: using Array<> API, or directly using the ei_p* functions. The former has the advantage to allow you to use "blocks" of multiple packets with automatic unrolling, while the later offers more flexibility for fine tuning...Greetings, Christoph On 07/03/2010 09:11 AM, Gael Guennebaud wrote:Hi, first things first, supporting a new vector engine in eigen is relatively easy. However, you won't be able to right code "templatized" on the vector engine, and even with a custom library I doubt that's doable. The choice of the vector engine has to be compiler options. So, the way to support multiple vector engines at runtime (I guess this is what you want?), is to have a trivial function foo(...) selecting the right implementation function e.g., foo_sse(...), foo_sse4(...), etc. The foo_* functions are implemented only once, by putting it in its own file which will be compiled multiple times with different compiler options (e.g., -msse, -msse4, etc.). The actual of the function (foo_sse) will be built using the preprocessor, e.g.: EIGEN_CAT(foo_,VECTOR_ENGINE_SUFFIX)(...) { .... } Now regarding the special interleaved packing of the data, there is currently no such thing in eigen, however, I think you can easily add that on top of, e.g., a Array<float,Dynamic,ei_packet_traits<float>::size,RowMajor>. It will be initialized with dimension * ((size/ei_packet_traits<float>::size)+((size%ei_packet_traits<float>::size)==0 ? 0 : 1) rows where dimension is three in your example, and size is the number of element. You can easily get the i-th element as follow: underlying_array.block<dimension,1>((i/ei_packet_traits<float>::size)*dimension, i%ei_packet_traits<float>::size) = Vector3f(x,y,z); The idea would be to add a class warping this underlying_array to make it convenient to use. The main questions are what is the set of features which have to be supported? How do we want to use it? through a manual loop over the (e.g., 3x4) blocks? through functors? through high level expression template code? etc. To finish, this is definitely something I planed to do the future, so I'd be glad to discuss with you its design and help you to get it right regarding Eigen. cheers, gael On Fri, Jul 2, 2010 at 11:56 PM, keller<christoph-keller@xxxxxx> wrote:Hello, I have to implement a raytracer that uses the Vector units of modern x86 cpus. Intel will introduce AVX soon. I have to support AVX and SSE and i do not want all of my code replicated. I currently use the Intels intrinsics. If Intel adds the Larrabee extensions into normal x86 CPUs one would even have to write three versions. As i use Eigen in parts of the software i wonder if one could support this with Eigen. The typical situation is like this. -One has a ray bundle with 16 rays that are arranged in SoA format: x1,..,x4 y1,...,y4 z1,...,z4 ..... x13,..,x16 y13,...,y16 z13,...,z16 This format has to be changed when using AVX/LRB of course. -One source of the rays and an object like a sphere Ideally i have a function like testCut(Bundles, Sphere) that uses a template parameter to set the Vector extension to use. I do not want to partially specialize this function but use some structures and functions in testCut that depend on the template parameter. The question for me is: Develop a small library of my own, or use Eigen (where i can contribute this functionality). I know i can do this with a custom library, but i dont know if it is easy to add this functionality to Eigen. I think a lot of people will have the same problem soon. Greetings, Christoph

EIGEN_STRONG_INLINE int bitmask() { return _mm_movemask_ps(_mm_load_ps(m_storage.data())); } EIGEN_STRONG_INLINE bool allNegative() { return bitmask() == 15; } template <typename mT> EIGEN_STRONG_INLINE Map<mT, Aligned> subCol(int i) { return Map<mT,Aligned>(&coeff(0,i)); } typedef Array<_Scalar, _Rows, 1, _Options, _MaxRows, 1> ArSoam; EIGEN_STRONG_INLINE ArSoam soam(Array& ar) { ArSoam ret = ArSoam::Zero(); for(int i=0; i < _Cols; i++) { ret += this->col(i)*ar.col(i); } return ret; } template <class T> EIGEN_STRONG_INLINE Array& loadRows(T vec) { for(int i = 0; i < _Cols; i++) this->col(i).fill(vec[i]); return *this; }

EIGEN_MAKE_CWISE_BINARY_OP(operator|, ei_scalar_or_op) EIGEN_MAKE_CWISE_BINARY_OP(operator&, ei_scalar_and_op) EIGEN_MAKE_CWISE_BINARY_OP(operator^, ei_scalar_xor_op) EIGEN_MAKE_CWISE_BINARY_OP(cmpgt, ei_scalar_cmpgt_op) template<typename OtherDerived> EIGEN_STRONG_INLINE Derived & operator|=(const ArrayBase<OtherDerived> &other) { SelfCwiseBinaryOp<ei_scalar_or_op<Scalar>, Derived> tmp(derived()); tmp = other; return derived(); } template<typename OtherDerived> EIGEN_STRONG_INLINE Derived & operator&=(const ArrayBase<OtherDerived> &other) { SelfCwiseBinaryOp<ei_scalar_and_op<Scalar>, Derived> tmp(derived()); tmp = other; return derived(); } template<typename OtherDerived> EIGEN_STRONG_INLINE Derived & operator^=(const ArrayBase<OtherDerived> &other) { SelfCwiseBinaryOp<ei_scalar_xor_op<Scalar>, Derived> tmp(derived()); tmp = other; return derived(); }

/** \internal * \brief Template functor to compute the binary or of two scalars * * \sa class CwiseBinaryOp, MatrixBase::operator+, class VectorwiseOp, MatrixBase::sum() */ template<typename Scalar> struct ei_scalar_or_op { EIGEN_EMPTY_STRUCT_CTOR(ei_scalar_or_op) EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a, const Scalar& b) const { unsigned tmp = (*((unsigned*)&a)) | (*((unsigned*)&b)); return *((float*)(&tmp));} template<typename PacketScalar> EIGEN_STRONG_INLINE const PacketScalar packetOp(const PacketScalar& a, const PacketScalar& b) const { return ei_por(a,b); } template<typename PacketScalar> EIGEN_STRONG_INLINE const Scalar predux(const PacketScalar& a) const { return ei_predux(a); } }; template<typename Scalar> struct ei_functor_traits<ei_scalar_or_op<Scalar> > { enum { Cost = NumTraits<Scalar>::AddCost, PacketAccess = ei_packet_traits<Scalar>::size>1 }; }; /** \internal * \brief Template functor to compute the binary and of two scalars * * \sa class CwiseBinaryOp, MatrixBase::operator+, class VectorwiseOp, MatrixBase::sum() */ template<typename Scalar> struct ei_scalar_and_op { EIGEN_EMPTY_STRUCT_CTOR(ei_scalar_and_op) EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a, const Scalar& b) const { unsigned tmp = (*((unsigned*)&a)) & *(((unsigned*)&b)); return *((float*)(&tmp)); } template<typename PacketScalar> EIGEN_STRONG_INLINE const PacketScalar packetOp(const PacketScalar& a, const PacketScalar& b) const { return ei_pand(a,b); } template<typename PacketScalar> EIGEN_STRONG_INLINE const Scalar predux(const PacketScalar& a) const { return ei_predux(a); } }; template<typename Scalar> struct ei_functor_traits<ei_scalar_and_op<Scalar> > { enum { Cost = NumTraits<Scalar>::AddCost, PacketAccess = ei_packet_traits<Scalar>::size>1 }; }; /** \internal * \brief Template functor to compute the binary xor of two scalars * * \sa class CwiseBinaryOp, MatrixBase::operator+, class VectorwiseOp, MatrixBase::sum() */ template<typename Scalar> struct ei_scalar_xor_op { EIGEN_EMPTY_STRUCT_CTOR(ei_scalar_xor_op) EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a, const Scalar& b) const { unsigned tmp = (*((unsigned*)&a)) ^ (*((unsigned*)&b)); return *((float*)(&tmp)); } template<typename PacketScalar> EIGEN_STRONG_INLINE const PacketScalar packetOp(const PacketScalar& a, const PacketScalar& b) const { return ei_pxor(a,b); } template<typename PacketScalar> EIGEN_STRONG_INLINE const Scalar predux(const PacketScalar& a) const { return ei_predux(a); } }; template<typename Scalar> struct ei_functor_traits<ei_scalar_xor_op<Scalar> > { enum { Cost = NumTraits<Scalar>::AddCost, PacketAccess = ei_packet_traits<Scalar>::size>1 }; };

// EigenTest.cpp : Definiert den Einstiegspunkt ___ die Konsolenanwendung. // #define EIGEN_ARRAYBASE_PLUGIN "arraybase.h" #define EIGEN_ARRAY_PLUGIN "array.h" #include <Eigen/Core> #include <Eigen/Geometry> #include <iostream> #include <xmmintrin.h> #include <emmintrin.h> using namespace Eigen; using namespace std; #ifndef EIGEN_VECTORIZE #error #endif /* cut sphere (r, pos) with rays (from, step) * only test if cut occurs */ template <int vlen, int numar> EIGEN_DONT_INLINE void cut(float& r, Vector4f& pos, Array<float, vlen, numar*3>& from, Array<float, vlen, numar*3>& step, Array<float, vlen, numar>& mask) { typedef Array<float , vlen, 1> cT; typedef Array<float , vlen, 3> mT; mT SOAp; SOAp.loadRows(pos); cT SOArr = cT::Constant(r*r); /* obviously for sphere: * (from+t*step)^2 = r*r * from*step*2*t+from^2+t*t*step^2 = r*r * (t*t*step^2) + (2*from*step*t) + (from^2-r*r); * a*t* t + b*t + c */ cout << "Sphpos:" << SOAp << endl; cout << "SphRad:" << SOArr << endl; for(int i=0; i < numar; i ++) { mT frS = from.template subCol<mT>(i*3); mT stS = step.template subCol<mT>(i*3); Map<cT, Aligned> maskS = mask.template subCol<cT>(i); //cout << "from:\n" << frS << endl; //cout << "step:\n" << stS << endl; //cout << "mask:\n" << maskS << endl; mT diff = frS - stS; cT aInv = stS.soam(stS).inverse(); //cout << "(step*step).inverse()\n" << aInv << endl; cT b = 2*stS.soam(frS); cT c = frS.soam(frS) - SOArr; cT p = b * aInv; cT q = c * aInv; //cout << "p:\n" << p << endl; //cout << "q:\n" << q << endl; cT sqrVal = (p*p)*0.25f - q; //cout << "sqrVal:\n" << sqrVal << endl; if(sqrVal.allNegative()) continue; EIGEN_ASM_COMMENT("call to sqrt"); sqrVal = sqrVal.sqrt(); EIGEN_ASM_COMMENT("after call to sqrt"); cT bigSol = -p*0.5f + sqrVal; std::cout << "bigSol:\n" << bigSol << endl << "BeforeMask\n" << maskS << endl; maskS |= bigSol.cmpgt(cT::Zero()); std::cout << "AfterMask\n" << maskS << endl; } } int main(int argc, char** argv) { using namespace std; #define VLEN 4 #define NUMAR 16/VLEN float r = 4; Vector4f pos = Vector4f::Constant(1); pos[2] = 2; Array<float, VLEN, NUMAR*3> from = Array<float, VLEN, NUMAR*3>::Zero(); Array<float, VLEN, NUMAR*3> step = Array<float, VLEN, NUMAR*3>::Ones(); Array<float, VLEN, NUMAR> hitmask = Array<float, VLEN, NUMAR>::Zero(); /*all ray should cut the sphere*/ cut<VLEN, NUMAR>(r, pos, from, step, hitmask); cout << "Hitmask after Call\n" << hitmask << endl; return 0; }

**Follow-Ups**:**Re: [eigen] AVX/LRB and SSE with SoA support***From:*Benoit Jacob

**Re: [eigen] AVX/LRB and SSE with SoA support***From:*Rohit Garg

**Messages sorted by:**[ date | thread ]- Prev by Date:
**Re: [eigen] Do we need geometry refactoring?** - Next by Date:
**Re: [eigen] AVX/LRB and SSE with SoA support** - Previous by thread:
**[eigen] pseudo-inverse** - Next by thread:
**Re: [eigen] AVX/LRB and SSE with SoA support**

Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |