Re: [eigen] AVX/LRB and SSE with SoA support |
[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]
Hi, I've attached a modified version of your code for Eigen3 as well as the corresponding ASM generated with gcc 4.5. On Fri, Jul 23, 2010 at 6:45 AM, keller <christoph-keller@xxxxxx> wrote: > Hello, > > I implemented a function where a sphere cuts a Ray Bundle (Most problems can > be demonstrated there). It may not be optimally implemented, but i think it > works as a simple example. > > I did not yet put the data into blocks, as a few other problems occurred. > > I used eigen2 for this. > -using VS2008 and the Intel compiler it compiles and runs correctly. > -g++ prints out errors with cwise() Yes this is because when you call a template function on a template type you have to use the template keyword, e.g.: mat.template block<rows,cols>(i,j) > The resulting code does not use SSE-Instructions that often. I do not know > if this is because i used eigen2 in the wrong way. The compiler(VS2008) has > SSE-Instructions enabled. > > Also the _mm_or_ps is avaiable through ei_por in Eigen3, but not via a > normal interface function. This is something that would make sense to have upstream in the Array API. The question is shall we use operator| (i.e., A | B) or a very explicit A.bitwiseOr(B) function. Since operator| is not defined for non integral type I would go for the explicit bitwiseOr. > _mm_movemask_ps is not avaiable at all. This one is more difficult to expose in a useful and portable way. Perhaps something like: ei_pallpositive(v) / ei_pallnegative(v) for the low level functions, and perhaps in second time we could have shortcuts for: (A > 0).all() and (A < 0).all() like A.allPositive() / A.allNegative() which would be built on top of the respective ei_p* functions, but I'm not 100% sure and this is really specific use cases. I doubt that's really critical even in your case. gael > Is it a good idea at all to use Eigen, if one does not want to be more than > about 20% slower than with a naive Intrinsics-implementation? For this kind of low level stuff you can attack Eigen at two level: using Array<> API, or directly using the ei_p* functions. The former has the advantage to allow you to use "blocks" of multiple packets with automatic unrolling, while the later offers more flexibility for fine tuning... > Greetings, > Christoph > > > On 07/03/2010 09:11 AM, Gael Guennebaud wrote: >> >> Hi, >> >> first things first, supporting a new vector engine in eigen is >> relatively easy. However, you won't be able to right code >> "templatized" on the vector engine, and even with a custom library I >> doubt that's doable. The choice of the vector engine has to be >> compiler options. So, the way to support multiple vector engines at >> runtime (I guess this is what you want?), is to have a trivial >> function foo(...) selecting the right implementation function e.g., >> foo_sse(...), foo_sse4(...), etc. The foo_* functions are implemented >> only once, by putting it in its own file which will be compiled >> multiple times with different compiler options (e.g., -msse, -msse4, >> etc.). The actual of the function (foo_sse) will be built using the >> preprocessor, e.g.: >> >> EIGEN_CAT(foo_,VECTOR_ENGINE_SUFFIX)(...) { .... } >> >> Now regarding the special interleaved packing of the data, there is >> currently no such thing in eigen, however, I think you can easily add >> that on top of, e.g., a >> Array<float,Dynamic,ei_packet_traits<float>::size,RowMajor>. It will >> be initialized with dimension * >> >> ((size/ei_packet_traits<float>::size)+((size%ei_packet_traits<float>::size)==0 >> ? 0 : 1) rows where dimension is three in your example, and size is >> the number of element. You can easily get the i-th element as follow: >> >> >> underlying_array.block<dimension,1>((i/ei_packet_traits<float>::size)*dimension, >> i%ei_packet_traits<float>::size) = Vector3f(x,y,z); >> >> The idea would be to add a class warping this underlying_array to make >> it convenient to use. The main questions are what is the set of >> features which have to be supported? How do we want to use it? through >> a manual loop over the (e.g., 3x4) blocks? through functors? through >> high level expression template code? etc. >> >> To finish, this is definitely something I planed to do the future, so >> I'd be glad to discuss with you its design and help you to get it >> right regarding Eigen. >> >> cheers, >> >> gael >> >> On Fri, Jul 2, 2010 at 11:56 PM, keller<christoph-keller@xxxxxx> wrote: >> >>> >>> Hello, >>> >>> I have to implement a raytracer that uses the Vector units of modern x86 >>> cpus. >>> >>> Intel will introduce AVX soon. I have to support AVX and SSE and i do not >>> want all of my code replicated. I currently use the Intels intrinsics. If >>> Intel adds the Larrabee extensions into normal x86 CPUs one would even >>> have >>> to write three versions. >>> >>> As i use Eigen in parts of the software i wonder if one could support >>> this >>> with Eigen. The typical situation is like this. >>> -One has a ray bundle with 16 rays that are arranged in SoA format: >>> x1,..,x4 >>> y1,...,y4 >>> z1,...,z4 >>> ..... >>> x13,..,x16 >>> y13,...,y16 >>> z13,...,z16 >>> This format has to be changed when using AVX/LRB of course. >>> -One source of the rays and an object like a sphere >>> >>> Ideally i have a function like testCut(Bundles, Sphere) that uses a >>> template >>> parameter to set the Vector extension to use. I do not want to partially >>> specialize this function but use some structures and functions in testCut >>> that depend on the template parameter. >>> >>> The question for me is: Develop a small library of my own, or use Eigen >>> (where i can contribute this functionality). I know i can do this with a >>> custom library, but i dont know if it is easy to add this functionality >>> to >>> Eigen. >>> >>> I think a lot of people will have the same problem soon. >>> >>> Greetings, >>> Christoph >>> >>> >>> >>> >>> >> >> >> > >
// EigenTest.cpp : Definiert den Einstiegspunkt ___ die Konsolenanwendung. // #include <Eigen/Core> #include <Eigen/Geometry> #include <iostream> #include <xmmintrin.h> #include <emmintrin.h> using namespace Eigen; #ifndef EIGEN_VECTORIZE #error #endif typedef Vector4f Vec; template <int h, int w> EIGEN_STRONG_INLINE Array<float , h, 1> SOAm (Array<float, h, w>& l, Array<float, h, w>& r) { EIGEN_ASM_COMMENT("begin SOAm"); Array<float , h, 1> ret = Array<float , h, 1>::Zero(); for(int i=0; i < w; i++) { ret += l.col(i) * r.col(i); } EIGEN_ASM_COMMENT("end SOAm"); return ret; } /* cut sphere (r, pos) with rays (from, step) * only test if cut occurs */ template <int len, int vLen> EIGEN_DONT_INLINE void cut(float& r, Vec& pos, Array<float, len, 3>& from, Array<float, len, 3>& step, Array<float, len, 1>& mask) { typedef Array<float , vLen, 1> cT; typedef Array<float , vLen, 3> mT; mT SOAp; cT SOArr = cT::Constant(r*r); /* obviously for sphere: * (from+t*step)^2 = r*r * from*step*2*t+from^2+t*t*step^2 = r*r * (t*t*step^2) + (2*from*step*t) + (from^2-r*r); * a*t* t + b*t + c */ for(int i=0; i < 3; i++) //this may be set into the constructor SOAp.col(i).fill(pos[i]); for(int i=0; i < len; i += vLen) { // Use Map to tell Eigen the alignment is preserved mT frS = Map<mT,Aligned,OuterStride<len> >(&from.coeff(i,0)); mT stS = Map<mT,Aligned,OuterStride<len> >(&step.coeff(i,0)); mT diff = frS - stS; EIGEN_ASM_COMMENT("three calls to SOAm"); //This does not get inlined! cT aInv = SOAm<vLen,3>(stS, stS).inverse(); cT b = 2*SOAm<vLen,3>(stS, frS); cT c = SOAm<vLen,3>(frS, frS) - SOArr; cT p = b * aInv; cT q = b * aInv; cT sqrVal = (p*p)*0.25f - q; if(_mm_movemask_ps(_mm_load_ps(sqrVal.data())) == 15) continue; EIGEN_ASM_COMMENT("call to sqrt"); sqrVal = sqrVal.sqrt(); /* does this use _mm_sqrt_ps()? */ cT bigSol = -p*0.5f + sqrVal; EIGEN_ASM_COMMENT("manual SSE begins here"); /*may be better to do this in the add unit*/ __m128 doesCut = _mm_or_ps( /*case some sqrVal < 0 not considered here*/ _mm_cmpgt_ps(_mm_load_ps(bigSol.data()),_mm_setzero_ps()), _mm_load_ps(mask.data()+i)); _mm_store_ps(mask.data()+i, doesCut); } EIGEN_ASM_COMMENT("end"); } int main(int argc, char** argv) { using namespace std; #define LEN 16 float r = 4; Vec pos = Vec::Constant(1); pos[2] = 2; /* I am not sure if sth. like * Array<float, 3, 4>[4] from, step; * would be better for the memory access in cut*/ Array<float, LEN, 3> from = Array<float, LEN, 3>::Zero(); Array<float, LEN, 3> step = Array<float, LEN, 3>::Ones(); /*use float, as the floating point unit is used*/ Array<float, LEN, 1> hitmask = Array<float, LEN, 1>::Zero(); /*all ray should cut the sphere*/ cut<LEN, 4>(r, pos, from, step, hitmask); return 0; }
Attachment:
rayb.asm
Description: Binary data
Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |