Re: [eigen] BLAS backend

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Hi Christian,

Almost anything you can do using MPI on a shared memory machine you
can do with OpenMP with added flexibility.  MPI is a message passing
paradigm and isn't suited for the sort of multicore optimization we're
looking at this semester.  An MPI-enabled Eigen is an interesting
idea, and I agree with your thoughts on high-level optimization and
distribution of parallel tasks.

A

On Fri, Oct 16, 2009 at 12:08 PM, Christian Mayer
<mail@xxxxxxxxxxxxxxxxx> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Hi all,
>
> @Jean Sreng: I don't know what your VR simulation is doing in detail.
> But for the normal 3D stuff you should be using small (3x3 or 4x4) fixed
> size matrices; those are perfectly supported by Eigen2 and using any
> BLAS implementation will only give you worse performance.
> If you are using big, variable sized matrices (e.g. to solve liear
> equation systems) the question for a BLAS backend is valid although it's
> worth a try to use the native implementation of Eigen2.
>
> As Eigen2 is using expression templates even a GPU based backend might
> profit by eigen as it can optimize the calculations to minimize the data
> transfers between CPU and GPU do to late evaluation of the equation.
>
> Aron Ahmadia schrieb:
>> It actually turns out that one of my students this semester is
>> considering how to parallelize Eigen using OpenMP for multithreading,
>
> I'm not too fond of OpenMP for parallelisation in our case. It's very
> good to parallelisate an loop (i.e. e.g. to multithread a single matrix
> multiplication) but to take advantage of the expression templates it
> might be not powerfull enough. MPI might be an better option there.
>
> Parallelisation at the algorithm level (= expression template level)
> gives you the advantage to perform operations that have no dependancy at
> each other. For example:
>
>  result = A*B + C*D    (A,B,C,D are big matrices)
>
> It's much better to have - one a two core CPU - one thread that's
> calculation A*B and another doing C*D than both threads fighting each
> other (= locks) doing A*B and then doing an C*D...
>
> In the end you need multhithreaded elementary operations (like GEMM)
> *and* high level parallelisation of the algorithm. And an optimizer
> that's deciding the best multithreading strategy for each algorithm. An
> very interesting and highly complex task.
>
> CU,
> Christian
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
>
> iEYEAREIAAYFAkrYOB8ACgkQoWM1JLkHou0V4QCfWwXmXbyIcD5AKwWhfv/xFNW+
> CjoAnie0Hd3qhEK38rQdVyw1GjQkPKVU
> =HFo3
> -----END PGP SIGNATURE-----
>
>
>



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/