Rohit Garg schrieb:
>> So, in which area does Intel MKL still have a long-term lead? I would
>> say parallelization. We haven't started that yet and it is probably a
>> very, very tough one. It's what I have in mind when I say that a
>> BLAS/LAPACK wrapper is still welcome.
> Why do you think parallelization is very difficult? Do you mean
> parallelization nfrastructure?  AFAICS, using openmp will be cool. Let
> compiler handle all the dirty buisness etc This is something I want to
> explore (time availability is of course important !) so I would like
> some heads up.

(disclaimer: I haven't done much parallelization in the past, but I
listened with much interest a few lectures and different discussions on
the net)

Do NOT use OpenMP in our case!

OpenMP is great to parallelize a few loops in old code where you can't
spend the time to do it right. If you've got the choice you should
allways rethink every algorithm and implement it in a parallel way (with
the threading lib of your choice)


You've got much more control and you can think ahead.

In the case of EIGEN with the expression templates we have a very strong
base that can support such an approach. As the compiler knows the
calculations ahead it could parallelize some calcutations. E.g. look at
the expression:

  E = A*B + C*D

You could do it the dumb way to use a parallel A*B, then a parallel C*D
and at the end a parallel E = prod1 + prod2. That's what OpenMP could
offer you.
But wouldn't it be much wiser to run A*B in one thread, C*D in another
and a E = prod1 + prod2? That's much better with data location...

But chaning that code to use a very lightweight sceduler and different
tasks with minimal locking and still best data locality is hard work.
But it'll be woth it - for the big matrix case (I *guess* small, fixed
matrices won't benefit at the EIGEN level of parallelsation. There the
bigger algorithms have to take care of it).


