Re: [eigen] FLENS C++ expression template Library has excellent documentation |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] FLENS C++ expression template Library has excellent documentation
- From: Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
- Date: Sat, 18 Apr 2009 12:06:43 -0400
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=NK6q0V/BFCGz2HLYCxXh7MwpWg4v2qwcU2lYvSZtHqI=; b=KixN9kKuieGeZvicq3P/j5gJFe2fPokO5v0iZL2BigszMHzlDtxek8xtj4k9tJrvQm fv3Dnx9dRSz27xxIPYFZAkRnnmECMjQG6Tsh4zyZpXJ26fS/wRnibHRHi8Sbl9Qh1aec TdfF1H/Jav6+QrfxROhJCBg1IQ4yiiRB+YGys=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=Fzdx6Ny3VlYZXsyCxqr8duRLgh8W/3I9/O57bhlIX99uRXpTC2FdFPzr+8QAULMHOu 335bous2DbXEMwVwZkHLZ7ZGcwsQAZwnwxvNFIXuXZzeJMqAWllE+tT1AHQu+xgoqXUr Qvyh5vrTjX3NGM4U5/VXLpg7i7CLkkeKDydJg=
2009/4/18 Ilya Baran <baran37@xxxxxxxxx>:
> Hi, Benoit,
>
> Wow, you're looking far ahead :)
>
> While fully automatic optimized implementation is their "ultimate
> goal," I think their methods already allow much simplified manual
> implementation. Look, for example, at their Cholesky code:
>
> http://z.cs.utexas.edu/wiki/LA.wiki/Chol_l/FLAMEC/BlkVar3
OK, I'm not sure I understand. This header seems to be implementing
block cholesky calling a lot of convenience functions helping it with
that. So FLAME is that, a set of helper functions to write block
algorithms?
If yes, then Eigen can be improved to allow that kind of thing without
much effort, I mean our Block expression allows to do part of that
already, also our comma-initializer combined with xprs, so i think
it's easy to add the rest (like repartitioning). I understand that
they had to add a lot of C functions on top of BLAS to deal with
blocks, but we don't need to, thanks to expressions.
I really mean to implement Block LLt or partial LU soon, it can be
written cleanly with Eigen blocks.
What I was more thinking about when I said 2015, was automatic
transformation of non-blocked algorithms into blocked algorithms. But
maybe I misunderstood!
Cheers,
Benoit
>
> Aside from some LAPACK-style names, this is extremely easy to read and
> verify (and would be even easier in C++) and is rich in BLAS 3
> operations (I think Eigen's current LLT implementation is BLAS 2 based
> and hence slows down for large matrices--but it makes a great "base
> case" for a blocked algorithm).
>
> I haven't worked it out, but I think it's likely that the
> bidiagonalization part of the SVD you're planning to write can be
> formulated with a similar template.
>
> -Ilya
>
> On Sat, Apr 18, 2009 at 11:01 AM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
>> Wow, this is very interesting, thanks for the link.
>>
>> Notice that even they call it a "ultimate goal" to generate optimized
>> code automatically -- they're not there yet in general. But yes, it's
>> very interesting.
>>
>> Once they finish working out the general theory, an implementation
>> could theoretically be done by c++ metaprogramming, so why not in
>> Eigen, although it remains to be seen what the compilation times would
>> be.
>>
>> More stuff for eigen 3.0 in 2015 ;)
>>
>> Benoit
>>
>> 2009/4/18 Ilya Baran <baran37@xxxxxxxxx>:
>>> Hello,
>>>
>>> Let me throw another library I stumbled on into the discussion pot:
>>>
>>> libFLAME: http://www.cs.utexas.edu/users/flame/
>>>
>>> This is by the same group that employs Goto, I think.
>>>
>>> As far as I understand, the main idea behind this is that many LAPACK
>>> algorithms have a similar block-based structure that allows efficient
>>> use of BLAS 3. They build some generic operations that simplify the
>>> common steps (partitioning, recursion, traversal), making efficient
>>> code for a particular algorithm much smaller and easier to write. To
>>> avoid the overhead of the recursion and bookkeeping, the algorithm
>>> must still be implemented for a nontrivial size base case, but the
>>> performance of that becomes less critical for large matrix sizes.
>>> They also use this structure to do parallelization, but I don't know
>>> much about that.
>>>
>>> It certainly doesn't make sense for Eigen to merge with them in any
>>> way, but I'm wondering if the generic structure can be easily
>>> reimplemented in Eigen (with its support for Block views) and simplify
>>> coding high-level algorithms?
>>>
>>> Thanks,
>>>
>>> -Ilya
>>>
>>> On Fri, Apr 17, 2009 at 3:49 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
>>>> 2009/4/17 Christian Mayer <mail@xxxxxxxxxxxxxxxxx>:
>>>>> (Disclaimer: I'm not knowing FLENS)
>>>>>
>>>>> FLENS and EIGEN have totally different use cases:
>>>>> - - EIGEN is a lib that gives you the best performance for small, fixed
>>>>> size matrices and vectors that's possible (e.g. those that are typical
>>>>> for 3D intensive applications)
>>>>> - - BLAS/LAPACK gives you the best performance (using the right
>>>>> implementation) for big, variable sized matrices and vectors (i.e. those
>>>>> used in numerical applications). FLENS is adding a modern, object
>>>>> orientated wrapper around this functionality.
>>>>>
>>>>> In this case both libs can peacefully coexist...
>>>>>
>>>>> As EIGEN is supporting variable sized matrices as well, both are
>>>>> starting to compete in exactly the same field of use. EIGEN has the
>>>>> advantage that the expression templates are the base and not something
>>>>> built on top, i.e. EIGEN can optimize "between" BLAS function calls.
>>>>
>>>> That's an accurate summary :)
>>>>
>>>>> FLENS has the advantage that it can use extremely optimized BLAS
>>>>> libraries (e.g. Intel MKL), something that EIGEN can't do (as it's cross
>>>>> platform) and won't do (as it doesn't have the funding that MKL has as a
>>>>> marketing platform for Intel...).
>>>>> => it would be interesting to see a benchmark of a non trivial numerical
>>>>> algorithm to see wich approach wins.
>>>>
>>>> Our (updated) benchmarks on the wiki clearly show that as long as you
>>>> use only 1 thread, we have the same level of performance as Intel MKL
>>>> for many important operations, suggesting that we could have the same
>>>> level of performance for all operations given enough contributions.
>>>>
>>>> This is made possible by 2 facts:
>>>> 1) we have much more generic code so that it takes us far less effort
>>>> to optimize
>>>> 2) For the rest, when generic code doesn't cut it (e.g. matrix
>>>> product), Gael is an amazing coder :)
>>>>
>>>> So, in which area does Intel MKL still have a long-term lead? I would
>>>> say parallelization. We haven't started that yet and it is probably a
>>>> very, very tough one. It's what I have in mind when I say that a
>>>> BLAS/LAPACK wrapper is still welcome.
>>>>
>>>>> But as EIGEN could include a BLAS/LAPACK lib as well, there shouldn't be
>>>>> a way for FLENS to win...
>>>>> Perhaps it's best to convince the FLENS author to join effords?
>>>>
>>>> It's hard to do without sounding offensive :) Also, adding a
>>>> BLAS/LAPACK wrapper to Eigen wouldn't be really difficult, so he would
>>>> feel that there doesnt survive much of FLENS in Eigen.
>>>>
>>>> While we're discussing other libraries, I think that an interesting one is NT2:
>>>>
>>>> http://nt2.sourceforge.net/
>>>>
>>>> I had a email conversation with its author, so here's what I know.
>>>> It's a c++ template library offering only very basic functionality,
>>>> and wrapping around LAPACK for advanced stuff. So in that respect, it
>>>> is similar to FLENS. The difference is that NT2 is extremely
>>>> aggressive on the expression-templates front. It is based on
>>>> Boost::proto which gives it a very high-up view of expression
>>>> templates, performing a lot of impressive global transformations on
>>>> expressions. He gets "for free" stuff that were hard to implement by
>>>> hand in Eigen such as the automatic introduction of temporaries where
>>>> appropriate. The downside is very long compilation times -- 3 seconds
>>>> for a trivial program and 10 seconds for a typical file, and remember
>>>> that this is only basic operations, since for the nontrivial stuff it
>>>> relies on LAPACK. Extrapolating, this suggest the order of magnitude
>>>> of 1 minute to compile any of our big linear algebra algorithms.
>>>> Another critique i'd formulate is that like Boost::ublas, it only
>>>> treats expr templates as an optimization that you can enable or
>>>> disable, so it doesn't leverage xpr templates to achieve a better API
>>>> like Eigen does.
>>>>
>>>> Still, this got me thinking. Relying on Boost::proto is a no-go in
>>>> 2009 as it makes compilation times awful. But what in 5 years? If
>>>> compilers improve enough until then, that could become very
>>>> interesting.
>>>>
>>>> Cheers,
>>>> Benoit
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>>
>
>
>