Re: [eigen] FLENS C++ expression template Library has excellent documentation

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Hi, Benoit,

Wow, you're looking far ahead :)

While fully automatic optimized implementation is their "ultimate
goal,"  I think their methods already allow much simplified manual
implementation.  Look, for example, at their Cholesky code:

http://z.cs.utexas.edu/wiki/LA.wiki/Chol_l/FLAMEC/BlkVar3

Aside from some LAPACK-style names, this is extremely easy to read and
verify (and would be even easier in C++) and is rich in BLAS 3
operations (I think Eigen's current LLT implementation is BLAS 2 based
and hence slows down for large matrices--but it makes a great "base
case" for a blocked algorithm).

I haven't worked it out, but I think it's likely that the
bidiagonalization part of the SVD you're planning to write can be
formulated with a similar template.

   -Ilya

On Sat, Apr 18, 2009 at 11:01 AM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
> Wow, this is very interesting, thanks for the link.
>
> Notice that even they call it a "ultimate goal" to generate optimized
> code automatically -- they're not there yet in general. But yes, it's
> very interesting.
>
> Once they finish working out the general theory, an implementation
> could theoretically be done by c++ metaprogramming, so why not in
> Eigen, although it remains to be seen what the compilation times would
> be.
>
> More stuff for eigen 3.0 in 2015 ;)
>
> Benoit
>
> 2009/4/18 Ilya Baran <baran37@xxxxxxxxx>:
>> Hello,
>>
>> Let me throw another library I stumbled on into the discussion pot:
>>
>> libFLAME: http://www.cs.utexas.edu/users/flame/
>>
>> This is by the same group that employs Goto, I think.
>>
>> As far as I understand, the main idea behind this is that many LAPACK
>> algorithms have a similar block-based structure that allows efficient
>> use of BLAS 3.  They build some generic operations that simplify the
>> common steps (partitioning, recursion, traversal), making efficient
>> code for a particular algorithm much smaller and easier to write.  To
>> avoid the overhead of the recursion and bookkeeping, the algorithm
>> must still be implemented for a nontrivial size base case, but the
>> performance of that becomes less critical for large matrix sizes.
>> They also use this structure to do parallelization, but I don't know
>> much about that.
>>
>> It certainly doesn't make sense for Eigen to merge with them in any
>> way, but I'm wondering if the generic structure can be easily
>> reimplemented in Eigen (with its support for Block views) and simplify
>> coding high-level algorithms?
>>
>> Thanks,
>>
>>   -Ilya
>>
>> On Fri, Apr 17, 2009 at 3:49 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
>>> 2009/4/17 Christian Mayer <mail@xxxxxxxxxxxxxxxxx>:
>>>> (Disclaimer: I'm not knowing FLENS)
>>>>
>>>> FLENS and EIGEN have totally different use cases:
>>>> - - EIGEN is a lib that gives you the best performance for small, fixed
>>>> size matrices and vectors that's possible (e.g. those that are typical
>>>> for 3D intensive applications)
>>>> - - BLAS/LAPACK gives you the best performance (using the right
>>>> implementation) for big, variable sized matrices and vectors (i.e. those
>>>> used in numerical applications). FLENS is adding a modern, object
>>>> orientated wrapper around this functionality.
>>>>
>>>> In this case both libs can peacefully coexist...
>>>>
>>>> As EIGEN is supporting variable sized matrices as well, both are
>>>> starting to compete in exactly the same field of use. EIGEN has the
>>>> advantage that the expression templates are the base and not something
>>>> built on top, i.e. EIGEN can optimize "between" BLAS function calls.
>>>
>>> That's an accurate summary :)
>>>
>>>> FLENS has the advantage that it can use extremely optimized BLAS
>>>> libraries (e.g. Intel MKL), something that EIGEN can't do (as it's cross
>>>> platform) and won't do (as it doesn't have the funding that MKL has as a
>>>> marketing platform for Intel...).
>>>> => it would be interesting to see a benchmark of a non trivial numerical
>>>> algorithm to see wich approach wins.
>>>
>>> Our (updated) benchmarks on the wiki clearly show that as long as you
>>> use only 1 thread, we have the same level of performance as Intel MKL
>>> for many important operations, suggesting that we could have the same
>>> level of performance for all operations given enough contributions.
>>>
>>> This is made possible by 2 facts:
>>> 1) we have much more generic code so that it takes us far less effort
>>> to optimize
>>> 2) For the rest, when generic code doesn't cut it (e.g. matrix
>>> product), Gael is an amazing coder :)
>>>
>>> So, in which area does Intel MKL still have a long-term lead? I would
>>> say parallelization. We haven't started that yet and it is probably a
>>> very, very tough one. It's what I have in mind when I say that a
>>> BLAS/LAPACK wrapper is still welcome.
>>>
>>>> But as EIGEN could include a BLAS/LAPACK lib as well, there shouldn't be
>>>> a way for FLENS to win...
>>>> Perhaps it's best to convince the FLENS author to join effords?
>>>
>>> It's hard to do without sounding offensive :) Also, adding a
>>> BLAS/LAPACK wrapper to Eigen wouldn't be really difficult, so he would
>>> feel that there doesnt survive much of FLENS in Eigen.
>>>
>>> While we're discussing other libraries, I think that an interesting one is NT2:
>>>
>>> http://nt2.sourceforge.net/
>>>
>>> I had a email conversation with its author, so here's what I know.
>>> It's a c++ template library offering only very basic functionality,
>>> and wrapping around LAPACK for advanced stuff. So in that respect, it
>>> is similar to FLENS. The difference is that NT2 is extremely
>>> aggressive on the expression-templates front. It is based on
>>> Boost::proto which gives it a very high-up view of expression
>>> templates, performing a lot of impressive global transformations on
>>> expressions. He gets "for free" stuff that were hard to implement by
>>> hand in Eigen such as the automatic introduction of temporaries where
>>> appropriate. The downside is very long compilation times -- 3 seconds
>>> for a trivial program and 10 seconds for a typical file, and remember
>>> that this is only basic operations, since for the nontrivial stuff it
>>> relies on LAPACK. Extrapolating, this suggest the order of magnitude
>>> of 1 minute to compile any of our big linear algebra algorithms.
>>> Another critique i'd formulate is that like Boost::ublas, it only
>>> treats expr templates as an optimization that you can enable or
>>> disable, so it doesn't leverage xpr templates to achieve a better API
>>> like Eigen does.
>>>
>>> Still, this got me thinking. Relying on Boost::proto is a no-go in
>>> 2009 as it makes compilation times awful. But what in 5 years? If
>>> compilers improve enough until then, that could become very
>>> interesting.
>>>
>>> Cheers,
>>> Benoit
>>>
>>>
>>>
>>
>>
>>
>
>
>



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/