[eigen] OpenMP implementation of Matrix*Vector operation

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

To: eigen@xxxxxxxxxxxxxxxxxxx
Subject: [eigen] OpenMP implementation of Matrix*Vector operation
From: gr x <xgrchn@xxxxxxxxx>
Date: Tue, 15 May 2012 14:52:49 +0000
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=cn+eHe03TiMsCabgTbuqvVGoefaqJ4k2lgNI0bL3CKY=; b=g3nzkRJSBe6LHJq8GJCHolNq1eQTnkuv5dTh73pZP1fYd2lU2ReLgDgLKuKk5CfI28 OjtIBygDUDdkgLjqAk00Qj/6tZ5bVIyEII+PPTV4EGTHzfEVODeotndmAUkRL1bERzNv BFDmduHElAyaX+BROMyFSNdEfvETGzNG8Mn1dq7ffzioGfM1cUPPRXX40EAdsC3XluP5 A6nYVC5wi8iY/QGAXNu16j0K89pVw++7e3gNzcWbcFMwaGDGawgy73W76hTlFyu6EEJl 78Viq9PefWoC/rVYlJWdTAi4sV8Bt5wHt2DQiZF9D2dOJMPxI7+IQovdQKMzx67Bg3Y9 SZaQ==

Hi everyone:

        I'm trying to use eigen for solving some linear algebra equations
iteratively, so the Matrix*Vector operation is quite common . As I know ,
in eigen, the openMP parallelization is only implemented for matrix*matrix
 multiplication( tell me if i'm wrong).

        However, in my case, the matrix is often with a moderate large scale
( typically several thousand or even hundreds of thousand for sparse).
So it's quite
necessary to take the advantage of multicore cpu.

        I've heard it's on the schedule, so how is it going now? Any
benchmark result
 with respect to matrix scale?

       Thanks~

       ps: Actually I've implemented a simple version by partitioning
the matrix to
several "blocks", but it turns out to work well only for scale
100~2000, and it's
much slower for larger scale (ie, no better than the serial code).

Here is my code snippet:
(built by g++ with flag -O2 -fopenmp -march=native)

int N = 1000; //Scale
Matrix<double,Dynamic,Dynamic,rowMajor> m =
Matrix<double,Dynamic,Dynamic>::Random(N,N);
VectorXd v = VectorXd::Random(N);
VectorXd s(N);

#pragma omp parallel for
for( int i = 0; i < omp_get_num_threads(); i += 1)
{
         int nthreads = omp_get_num_threads();
         int rank = omp_get_thread_num();
         int chunk = (N + nthreads -1)/nthreads;
         int i0 = rank * chunk;
         int i1 = (rank+1)*chunk<N ? (rank+1)*chunk : N;
         int in = i1 - i0;

         s.segment(i0,in) = m.block(i0,0,in,N)*v;
}

Is there a better way to do this?
thanks very much!

Messages sorted by: [ date | thread ]
Prev by Date: Re: [eigen] precision loss after conversion from rotation matrix to quaternion and back
Next by Date: Re: [eigen] Potential unsupported module: Lie Groups
Previous by thread: Re: [eigen] precision loss after conversion from rotation matrix to quaternion and back
Next by thread: [eigen] casting real array to a segement of complex array

Mail converted by MHonArc 2.6.19+

http://listengine.tuxfamily.org/