[eigen] OpenMP implementation of Matrix*Vector operation |

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

*To*: eigen@xxxxxxxxxxxxxxxxxxx*Subject*: [eigen] OpenMP implementation of Matrix*Vector operation*From*: gr x <xgrchn@xxxxxxxxx>*Date*: Tue, 15 May 2012 14:52:49 +0000*Dkim-signature*: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=cn+eHe03TiMsCabgTbuqvVGoefaqJ4k2lgNI0bL3CKY=; b=g3nzkRJSBe6LHJq8GJCHolNq1eQTnkuv5dTh73pZP1fYd2lU2ReLgDgLKuKk5CfI28 OjtIBygDUDdkgLjqAk00Qj/6tZ5bVIyEII+PPTV4EGTHzfEVODeotndmAUkRL1bERzNv BFDmduHElAyaX+BROMyFSNdEfvETGzNG8Mn1dq7ffzioGfM1cUPPRXX40EAdsC3XluP5 A6nYVC5wi8iY/QGAXNu16j0K89pVw++7e3gNzcWbcFMwaGDGawgy73W76hTlFyu6EEJl 78Viq9PefWoC/rVYlJWdTAi4sV8Bt5wHt2DQiZF9D2dOJMPxI7+IQovdQKMzx67Bg3Y9 SZaQ==

Hi everyone: I'm trying to use eigen for solving some linear algebra equations iteratively, so the Matrix*Vector operation is quite common . As I know , in eigen, the openMP parallelization is only implemented for matrix*matrix multiplication( tell me if i'm wrong). However, in my case, the matrix is often with a moderate large scale ( typically several thousand or even hundreds of thousand for sparse). So it's quite necessary to take the advantage of multicore cpu. I've heard it's on the schedule, so how is it going now? Any benchmark result with respect to matrix scale? Thanks~ ps: Actually I've implemented a simple version by partitioning the matrix to several "blocks", but it turns out to work well only for scale 100~2000, and it's much slower for larger scale (ie, no better than the serial code). Here is my code snippet: (built by g++ with flag -O2 -fopenmp -march=native) int N = 1000; //Scale Matrix<double,Dynamic,Dynamic,rowMajor> m = Matrix<double,Dynamic,Dynamic>::Random(N,N); VectorXd v = VectorXd::Random(N); VectorXd s(N); #pragma omp parallel for for( int i = 0; i < omp_get_num_threads(); i += 1) { int nthreads = omp_get_num_threads(); int rank = omp_get_thread_num(); int chunk = (N + nthreads -1)/nthreads; int i0 = rank * chunk; int i1 = (rank+1)*chunk<N ? (rank+1)*chunk : N; int in = i1 - i0; s.segment(i0,in) = m.block(i0,0,in,N)*v; } Is there a better way to do this? thanks very much!

**Messages sorted by:**[ date | thread ]- Prev by Date:
**Re: [eigen] precision loss after conversion from rotation matrix to quaternion and back** - Next by Date:
**Re: [eigen] Potential unsupported module: Lie Groups** - Previous by thread:
**Re: [eigen] precision loss after conversion from rotation matrix to quaternion and back** - Next by thread:
**[eigen] casting real array to a segement of complex array**

Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |