Re: [eigen] Tiny matrix in Eigen2

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Benoit Jacob wrote:
2009/9/17 WANG Xuewen <xuewen.wang@xxxxxxxxx>:
I have no problem to change NMaxTinyMatrixDimension to a pair (for example
to 6 or 8) and change to use Aligned so can possibly get the vectorization.
I've done a quick test and it seems it did improve ( I'll do a bit more test
to be sure).

OK, let's say right away that this still won't give you very good
vectorization. First, out of the 3 vectorization strategies in
Assign.h, only 2 will be applicable here. Second, since the size is
dynamic, they will have to do some runtime logic that pays of for
large enough matrices, but perhaps not at such small sizes.

So you can always try but it's not guaranteed to be faster, and may
even be slower (if the runtime logic is too costly to pay for itself).
In that sense, your DontAlign makes sense.

Actually, I tested only with dimension = 2 and it gave comparable result. But now with dimension = 3, it is much slower, so you are right.
Even if the dynamic dimension is not necessarily the same as 8 or 6, for
example, if it is 3, will I be able to gain from vectorization?

A packet of 'double' contains 2 doubles. If the size is 3, only 1
packet can fit inside, and there remains 1 double, so vectorization
allows to do only 2 operations instead of 3 (for vector ops).

I seriously doubt that that is worth it! i expect vectorization to
make things slower there.

vectorization works well when:
- either the sizes are large (>10)
- or they are known at compile time and multiples of the packet size

2. Even if the storage is in the stack, but the dimension is not known
at compile time. Will the loop get unrolled for most operations?

No, no loop will be unrolled at all, since that requires knowing the
exact size at compile time.


Is it possible to extend Eigen2 to allow unroll the loop?  We did it for the
home made library with meta programming, but I'm not sure if it is possible
to do it with Eigen2.

That's not possible to do without causing an overhead. If you want to
unroll here, you have to perform the operation on the whole allocated
array, not just the matrix. That requires initializing this array with
0's, otherwise you're slowed down by inf/nan values; and filling the
array with 0's has a cost. In Eigen, we don't initialize the arrays at
all, so creating a matrix on the stack has no cost at all.
Also, more issues appear if one wants to vectorize such operations.

Indeed, our internal code cannot take advantage of vectorization since it unrolls manually the loop.
I may take a look if you tell me that it is possible
and give me hint on how. Or will vectorization be more interesting anyway?

I'd rather point you to another direction: are you really sure that
the size is not known at compile time?

Since you're dealing with such small sizes, there aren't many
combinations anyway. Perhaps you could get a list of all the sizes
that are actually used, perhaps it's not that long. In that case it
may be a good idea to use completely fixed-size matrices. That means
that a lot more code will be instantiated, but that may be worth it if
you're after performance! You would then have if()'s in your code to
select between the code paths optimized for separate sizes.

I've thought about that possibility. It will make the code a bit ugly (code size is not really an issue here), especially since I may need at several places to do the if()'s. But I'll try anyway.
M = N;
M.diagonal().cwise() += scalar;

(don't remember, perhaps this cwise()+= requires #include<Eigen/Array>)



This doesn't work with 2.0.5 (doesn't compile, I tried it from the
beginning)

Here it works also with 2.0.5, are you sure that you #include<Eigen/Array>?

What's your compiler error?
OK. My fault, Maybe I did M.diagonal() += scalar;

thanks,

Xuewen




Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/