Re: [eigen] AVX/LRB and SSE with SoA support |

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

*To*: eigen@xxxxxxxxxxxxxxxxxxx*Subject*: Re: [eigen] AVX/LRB and SSE with SoA support*From*: keller <christoph-keller@xxxxxx>*Date*: Fri, 23 Jul 2010 06:45:19 +0200

Hello,

I did not yet put the data into blocks, as a few other problems occurred. I used eigen2 for this. -using VS2008 and the Intel compiler it compiles and runs correctly. -g++ prints out errors with cwise()

Would EIGEN_MATRIXBASE_PLUGIN be the way to include these?

Greetings, Christoph On 07/03/2010 09:11 AM, Gael Guennebaud wrote:

Hi, first things first, supporting a new vector engine in eigen is relatively easy. However, you won't be able to right code "templatized" on the vector engine, and even with a custom library I doubt that's doable. The choice of the vector engine has to be compiler options. So, the way to support multiple vector engines at runtime (I guess this is what you want?), is to have a trivial function foo(...) selecting the right implementation function e.g., foo_sse(...), foo_sse4(...), etc. The foo_* functions are implemented only once, by putting it in its own file which will be compiled multiple times with different compiler options (e.g., -msse, -msse4, etc.). The actual of the function (foo_sse) will be built using the preprocessor, e.g.: EIGEN_CAT(foo_,VECTOR_ENGINE_SUFFIX)(...) { .... } Now regarding the special interleaved packing of the data, there is currently no such thing in eigen, however, I think you can easily add that on top of, e.g., a Array<float,Dynamic,ei_packet_traits<float>::size,RowMajor>. It will be initialized with dimension * ((size/ei_packet_traits<float>::size)+((size%ei_packet_traits<float>::size)==0 ? 0 : 1) rows where dimension is three in your example, and size is the number of element. You can easily get the i-th element as follow: underlying_array.block<dimension,1>((i/ei_packet_traits<float>::size)*dimension, i%ei_packet_traits<float>::size) = Vector3f(x,y,z); The idea would be to add a class warping this underlying_array to make it convenient to use. The main questions are what is the set of features which have to be supported? How do we want to use it? through a manual loop over the (e.g., 3x4) blocks? through functors? through high level expression template code? etc. To finish, this is definitely something I planed to do the future, so I'd be glad to discuss with you its design and help you to get it right regarding Eigen. cheers, gael On Fri, Jul 2, 2010 at 11:56 PM, keller<christoph-keller@xxxxxx> wrote:Hello, I have to implement a raytracer that uses the Vector units of modern x86 cpus. Intel will introduce AVX soon. I have to support AVX and SSE and i do not want all of my code replicated. I currently use the Intels intrinsics. If Intel adds the Larrabee extensions into normal x86 CPUs one would even have to write three versions. As i use Eigen in parts of the software i wonder if one could support this with Eigen. The typical situation is like this. -One has a ray bundle with 16 rays that are arranged in SoA format: x1,..,x4 y1,...,y4 z1,...,z4 ..... x13,..,x16 y13,...,y16 z13,...,z16 This format has to be changed when using AVX/LRB of course. -One source of the rays and an object like a sphere Ideally i have a function like testCut(Bundles, Sphere) that uses a template parameter to set the Vector extension to use. I do not want to partially specialize this function but use some structures and functions in testCut that depend on the template parameter. The question for me is: Develop a small library of my own, or use Eigen (where i can contribute this functionality). I know i can do this with a custom library, but i dont know if it is easy to add this functionality to Eigen. I think a lot of people will have the same problem soon. Greetings, Christoph

// EigenTest.cpp : Definiert den Einstiegspunkt für die Konsolenanwendung. // #include <Eigen/Core> #include <Eigen/Geometry> #include <iostream> #include <xmmintrin.h> #include <emmintrin.h> USING_PART_OF_NAMESPACE_EIGEN #ifndef EIGEN_VECTORIZE #error #endif typedef Vector4f Vec; template <int h, int w> EIGEN_STRONG_INLINE Matrix<float , h, 1> SOAm (Matrix<float, h, w>& l, Matrix<float, h, w>& r) { Matrix<float , h, 1> ret = Matrix<float , h, 1>::Zero(); for(int i=0; i < w; i++) { ret += l.block<h, 1>(0, i).cwise() * r.block<h, 1>(0, i); } return ret; } /* cut sphere (r, pos) with rays (from, step) * only test if cut occurs */ template <int len, int vLen> void cut(float& r, Vec& pos, Matrix<float, len, 3>& from, Matrix<float, len, 3>& step, Matrix<float, len, 1>& mask) { typedef Matrix<float , vLen, 1> cT; typedef Matrix<float , vLen, 3> mT; mT SOAp; cT SOArr = cT::Constant(r*r); /* obviously for sphere: * (from+t*step)^2 = r*r * from*step*2*t+from^2+t*t*step^2 = r*r * (t*t*step^2) + (2*from*step*t) + (from^2-r*r); * a*t* t + b*t + c */ for(int i=0; i < 3; i++) //this may be set into the constructor SOAp.block<vLen,1>(0,i).fill(pos[i]); for(int i=0; i < len; i += vLen) { mT frS = from.block<vLen,3>(i,0); mT stS = step.block<vLen,3>(i,0); mT diff = frS - stS; //This does not get inlined! cT aInv = SOAm<vLen,3>(stS, stS).cwise().inverse(); cT b = 2*SOAm<vLen,3>(stS, frS); cT c = SOAm<vLen,3>(frS, frS) - SOArr; cT p = b.cwise()*aInv; cT q = b.cwise()*aInv; cT sqrVal = p.cwise()*p*0.25f - q; if(_mm_movemask_ps(_mm_load_ps(sqrVal.data())) == 15) continue; sqrVal = sqrVal.cwise().sqrt(); /* does this use _mm_sqrt_ps()? */ cT bigSol = -p*0.5f + sqrVal; /*may be better to do this in the add unit*/ __m128 doesCut = _mm_or_ps( /*case some sqrVal < 0 not considered here*/ _mm_cmpgt_ps(_mm_load_ps(bigSol.data()),_mm_setzero_ps()), _mm_load_ps(mask.data()+i)); _mm_store_ps(mask.data()+i, doesCut); } } int main(int argc, char** argv) { using namespace std; #define LEN 16 float r = 4; Vec pos = Vec::Constant(1); pos[2] = 2; /* I am not sure if sth. like * Matrix<float, 3, 4>[4] from, step; * would be better for the memory access in cut*/ Matrix<float, LEN, 3> from = Matrix<float, LEN, 3>::Zero(); Matrix<float, LEN, 3> step = Matrix<float, LEN, 3>::Ones(); /*use float, as the floating point unit is used*/ Matrix<float, LEN, 1> hitmask = Matrix<float, LEN, 1>::Zero(); /*all ray should cut the sphere*/ cut<LEN, 4>(r, pos, from, step, hitmask); return 0; }

**Follow-Ups**:**Re: [eigen] AVX/LRB and SSE with SoA support***From:*Gael Guennebaud

**References**:**[eigen] AVX/LRB and SSE with SoA support***From:*keller

**Re: [eigen] AVX/LRB and SSE with SoA support***From:*Gael Guennebaud

**Messages sorted by:**[ date | thread ]- Prev by Date:
**Re: [eigen] Issues regarding Quaternion-alignment and const Maps** - Next by Date:
**Re: [eigen] Issues regarding Quaternion-alignment and const Maps** - Previous by thread:
**Re: [eigen] AVX/LRB and SSE with SoA support** - Next by thread:
**Re: [eigen] AVX/LRB and SSE with SoA support**

Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |