Re: [eigen] Advanced vectorization

[ Thread Index | Date Index | More Archives ]

On 06.06.2011 13:34, Benoit Jacob wrote:
2011/6/6 Márton Danóczy<marton78@xxxxxxxxx>:
Hi all,

I'm trying to optimize the evaluation of a squared loss function and
its gradient. That is, I'd like to calculate

L = 0.5 ||Ax-b||^2
dL/dx = A'(Ax-b)

where A has more columns than rows, i.e. A'A would be huge and caching
it would be infeasible.

Right now, i have the following routine:

Scalar objfunc(const Matrix<Scalar, Dynamic, 1>&  x, Matrix<Scalar,
Dynamic, 1>&  g)
    e.noalias() = A * x - b;
    g.noalias() = A.adjoint() * e;
    return Scalar(0.5) * e.squaredNorm();

where /e/ is pre-allocated as a class member. When hand coding this,
instead of storing /e/, I would calculate it component-wise and
accumulate its squared norm along the way, thus avoiding iterating
twice. Is there a clever way to accomplish this with Eigen?

If A is of size n*n, then just reading it is n^2 memory accesses,
while you trick would be saving only n memory accesses (since e is a
vector of size n), so I'd say it's going to be negligible unless n is
very small. Also, by computing e coefficient-wise, you lose the
benefit of Eigen's cache-friendly matrix-vector product
implementation, which is important for large n.

I guess Márton's suggestion would first and foremost reduce the read-cost of A: If A is stored row-major, each row could be multiplied by x, and the resulting scalar times that row transposed can be added to g. Assuming that g and at least one row of A fit into the cache this could reduce RAM access by a factor of 2.


Dipl.-Inf. Christoph Hertzberg
Cartesium 0.051
Universität Bremen
Enrique-Schmidt-Straße 5
28359 Bremen

Tel: (+49) 421-218-64252

Mail converted by MHonArc 2.6.19+