|Re: [eigen] FLENS C++ expression template Library has excellent documentation|
[ Thread Index |
| More lists.tuxfamily.org/eigen Archives
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] FLENS C++ expression template Library has excellent documentation
- From: Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
- Date: Sat, 18 Apr 2009 12:12:03 -0400
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=VJ/vggEAdJ174LAVoip5MTcktVs8kSuMksODNBZ8x24=; b=t3B1bT72jOeQd3/ivZ6NkT85fGKfrjF9eojmh5mCcydmPWRiO9P373nC4kuiVWtQLP Fvcnw/BpKGWKuEDXLJ7GoaqX50I9y31+Cq2DQ8ULG9jL3PwxrEOzy6dDNaSuutwovEZT X4IMw+t/U0/mVPl9VNOs7On6hSaR/kQBrK+Lo=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=se7Lq9v0tABpMnTCJoCulNo3va2fve9aGiYByIw98xsRnHZL7+qRAhUFipESEzI0k3 GK3dMiztRL3tv1PQme3pVrIDY52WabfrC1dfCM5I0zYkpVV78e7lQaXTooLcraPZT8aj Cjex2YHtEVrKVPGb8yP2pLjeXFbwOZVaOW/oQ=
2009/4/18 Christian Mayer <mail@xxxxxxxxxxxxxxxxx>:
> Do NOT use OpenMP in our case!
> OpenMP is great to parallelize a few loops in old code where you can't
> spend the time to do it right. If you've got the choice you should
> allways rethink every algorithm and implement it in a parallel way (with
> the threading lib of your choice)
> You've got much more control and you can think ahead.
> In the case of EIGEN with the expression templates we have a very strong
> base that can support such an approach. As the compiler knows the
> calculations ahead it could parallelize some calcutations. E.g. look at
> the expression:
> E = A*B + C*D
> You could do it the dumb way to use a parallel A*B, then a parallel C*D
> and at the end a parallel E = prod1 + prod2. That's what OpenMP could
> offer you.
> But wouldn't it be much wiser to run A*B in one thread, C*D in another
> and a E = prod1 + prod2? That's much better with data location...
There are different use cases.
- many products of many small matrices
- one product of two large matrices
In the 2nd case, we really need (if we want to do parallelization at
all) to parallelize the for loops inside the matrix product.
Then yes, as discussed above, the first step is to rethink the
algorithm to do the product by blocks. But ultimately we need to start
multiple threads, and OpenMP seemed like one of the possibilities for
doing that. I agree that it's not at all about OpenMPifying for loops,
it's rather that OpenMP is possibly the only dependency-free portable
way of doing threads. Then of course we can consider adding an
optional dependency on a threads library instead.