Re: [eigen] Advanced vectorization

[ Thread Index | Date Index | More Archives ]

>> If A is of size n*n, then just reading it is n^2 memory accesses,
>> while you trick would be saving only n memory accesses (since e is a
>> vector of size n), so I'd say it's going to be negligible unless n is
>> very small. Also, by computing e coefficient-wise, you lose the
>> benefit of Eigen's cache-friendly matrix-vector product
>> implementation, which is important for large n.
> I guess Márton's suggestion would first and foremost reduce the read-cost of
> A:
> If A is stored row-major, each row could be multiplied by x, and the
> resulting scalar times that row transposed can be added to g. Assuming that
> g and at least one row of A fit into the cache this could reduce RAM access
> by a factor of 2.

I did exactly that and benchmarked it:

Scalar objfunc(const Matrix<Scalar, Dynamic, 1>&  x, Matrix<Scalar,
Dynamic, 1>&  g)
    Scalar f(0);

    for (ptrdiff_t i=0; i<AT.cols(); ++i)
        typename Matrix<Scalar, Dynamic, Dynamic>::ColXpr ai = AT.col(i);
        Scalar e = - b(i);
        g += e*ai;
        f += e*e;
    return Scalar(0.5) * f;

Benchmarking reveals that regardless of the size of A (1024x1024,
1024x4096, 4096x1024, 4096x4096), the original code is faster.
Conclusion: don't try to outsmart Gael (at least if your matrices
don't fit into the cache)...


Mail converted by MHonArc 2.6.19+