Re: [eigen] ISO C++ working groups on Linear Algebra / Machine Learning

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Francois you are 100%. correct :)! :)!
See it the same! Eigen without the tensor module is only the 2-tensor abstraction we mostly use ;). But numpys dot function is in fact a good example. I would rather generalize as contract ;)

Von meinem iPhone gesendet

Am 16.02.2019 um 14:19 schrieb Francois Fayard <fayard@xxxxxxxxxxxxx>:

Hi,

Le me put another euro in and say hello to Matthieu.

Let me first explain my usage of Eigen. I personally do not develop my own software using Eigen, but as my job is mainly about speeding up scientific software, I tend to run across a lot of code using Eigen. I have also spent some time optimising machine learning algorithms, so here are my two cents on what I would expect from a “scientific library”.
I strongly believe in data structures first. As Niklaus Wirth have pointed out, “Programs = Algorithms + DataStructures”. As a scientific developer, I mainly need two kind of data structures : arrays and hash tables. Let’s forget hash tables for the time being and do everything as if std::unordered_map was a good implementation of an hash table. So we need good arrays and unfortunately the C++ standard is awful regarding that.

- We need one dimensional arrays (a kind of std::vector), two dimensional arrays (a kind of Eigen::Array), three dimensional arrays, etc. Two dimensional arrays are perfect to represent matrices but I don’t believe that we should be forced to think of them as matrices. For instance, suppose that I have a family of n points in the plane, I would like to store them in a 2 dimensional array Array2D<double> point(n, 2) and access very efficiently component-wise if I want to, with point(i, k) (0 <= i < n and k = 0 or k = 1).. I should be able to consider “point” as a matrix if I want to, but I should not be forced to. Eigen provides Eigen::Array which works, but as soon as you want to have a family of points indexed by two integers (an Array3D<double> point(n, n, 2)), you can not do that anymore with Eigen. And using Eigen::Matrix<std::array<double, 2>> is not a solution if you don’t want the x and the y component of the points stored close to each other in memory. In a nutshell, you cannot mimic an Array3D<double> which is stored in Fortran order.

- Once those data structures are built, we need linear algebra for those who need it. I personally like the dot function that can do the following thing : y = dot(A, x) or C = dot(A, B). The first operation does a “matrix multiplication” between a two dimensional array A and a one dimensional array x and returns a one dimensional array y. The second does a matrix multiplication in between two dimensional arrays. You want the dot product of 2 arrays, just type ps = dot(x, y). The dot function is just tensor contraction and can be generalised to any tensor. This is roughly what you get with Numpy.

- For the “star” problem, I don’t have any advice as I don’t care. I would personaly do dot(A, B) for matrix product and hadamard(A, B) for Hadamard product.

In a nutshell, I really like the way Numpy works and I believe that it should be the way to go : provide efficient multidimensional arrays and then provide linear algebra on top of that. But those two should be decoupled and it should be possible to provide the multidimensional arrays without the linear algebra part.

François Fayard


On 16 Feb 2019, at 10:26, Ganriel Nützi <gnuetzi@xxxxxxxxx> wrote:

Hi all,

Just to put another dollar in :

For me why eigen is good has one simple reason, it tries to stick to mathematical conventions. also the tensor module with recent discussions about einstein notation is a good example. Keeping code and maths as close together as it gets, is the ultimate goal since otherwise it only introduces another indirection which sucks. If you simplify the gap between what you written down on paper (if anybody is still doing this ;), and I constantly encourage people to write down the math before implementing the stuff, which in the industry seldomly happens  because people are afraid of maths -> leading to strange computations and errors nevertheless hard to understand code). if you now introduce syntax which is so far from common notation in maths the problem gets even worse. So to all matrix-people, data scientists, linear algebra people etc. please keep eigen sane of other strange constructs you see in other libraries. Keep in mind: Linear Algebra is at the core bottom of every problem you solve in math and eventually on the computer, 99%. You wanna know why: Because LinAlg is probably the most elaborate wide spread math concept since its birth with the vector space in the 50ties. Our models of the real world all tend towards a LinAlg root because its a framework we can compute in (primal space, dual space, multilinear algebra where tensors form their base vectors).
The way you should write code should be in accordance to math and not the other way around (because of its history) of course there are several notations around but the problem is when people from computer science try to implement these concepts where they only see the end result but maybe not the whole picture (not saying that I have that but there are a lot of people contributing to eigen who have). What people also don’t understand when computing with matrices more often then not that every  number on a computer, any array, any matrix, is the result of choosing a basis in your vector space and representing your vector or linear map in it, giving us a coordinate tuple or a matrix, respectively. This concept should probably not be embedded into a standardization because the benefit would scare of people much more, but sometimes you wish people would know better because you see lots of code where wrong quantities get multiplied together making utter nonsense and this is not what you recognize when reading the code, even more by obscuring the thing by introducing strange operator syntax. 

So my whole point: Make the thing consistent with maths, try to find a way to not overcomplicate it. Thats what eigen makes a great library 👌👌.


Feel free to comment on my critical points here.

BR Gabriel Nützi

Von meinem iPhone gesendet

Am 15.02.2019 um 23:51 schrieb Matthieu Brucher <matthieu.brucher@xxxxxxxxx>:

Haha, it's not that we can't stand the matrix multiplication, it's that objectively a huge chunk of the scientists (probably the majority) is not using '*' as the matrix multiplication (from Fortran to Numpy....).
One of the other things to notice from data scientists is that we don't consider that everything is 2D, or 2D+ like Matlab, but we use 1D arrays as much as nD, which makes a big difference.
I use Eigen a lot for my personal projects that are doing some linear algebra, and even there, it can be annoying to jump from arrays to matrices to arrays. That's one of the biggest advantages of lumpy, just one type (the Numpy matrix type is not relevant nowadays).

Cheers,

Matthieu


Le jeu. 14 févr. 2019 à 18:29, Gael Guennebaud <gael.guennebaud@xxxxxxxxx> a écrit :

Hi Patrik,

On Wed, Feb 13, 2019 at 6:19 PM Patrik Huber <patrikhuber@xxxxxxxxx> wrote:
I recently found out that the C++ standardisation committee now created a Special Interest Group (SIG) for Linear Algebra within SG14 (the study group for game-dev & low-latency), and that there's also a new study group on machine learning, SG19.
Both groups have been formed within the last few months, and I don't think there was too much noise around it, so I thought it might be quite interesting for some people on the Eigen list. I also just joined their forums/list,

Thanks a lot for these informations! I joined both SG14 and SG19 lists too.
 
and I didn't recognise any familiar name for the Eigen list so far.

There is Matthieu Brucher who is a member of this list and posted here a few times.
 
On a first glance, I saw that they seem to make a few design decisions that are different from Eigen (e.g. operator* is only for scalar multiplications; or there are separate row/col_vector classes currently).

Regarding operator*, from their discussions we can already see clear disagreements between "linear algebra" people and more general "data scientists"... Some cannot stand operator* as being a matrix-matrix product and are basically seeking for a Numpy on steroid. Personally, as I mostly do linear algebra I almost never use the component-wise product and I'd have a hard time giving up operator* for matrix-matrix products. On the other hand, I found myself using .array() frequently for scalar addition, abs, min/max and comparisons... and I've to admit that our .array()/.matrix() approach is not ideal in this respect. 

Nevertheless, following the idea of a "numpy on steroid", if that's really what developers want, we might thought about making our "array world" world more friendly with the linear-algebra world by:
- adding a prod(,) (or dot(,)) function
- moving more MatrixBase functions to DenseBase (most of them could except inverse())
- allowing Array<>  as input of decompositions
- enabling "safe" binary operations between Array and Matrix (and returning an Array)

This way people who don't want operator as a matrix product, or with a strong experience of numpy, could simply forget about Matrix<>/.matrix()/.array() and exclusively use Array<>. Then time will tell us if, as with numpy::matrix vs numpy::array, everybody will give up about Matrix<>... (I strongly doubt).

Gaël.
 

I think it would be really great to get some people from Eigen (us!) aboard that process.
Here are some links:

SG14 mailing list: http://lists.isocpp.org/sg14/

There are two repos where they started mapping out ideas/code/paper:

Best wishes,

Patrik

--
Dr. Patrik Huber
Founder & CEO 4dface Ltd
3D face models & personal 3D face avatars for professional applications
United Kingdom




--



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/