Re: [eigen] Advanced vectorization

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


>> If A is of size n*n, then just reading it is n^2 memory accesses,
>> while you trick would be saving only n memory accesses (since e is a
>> vector of size n), so I'd say it's going to be negligible unless n is
>> very small. Also, by computing e coefficient-wise, you lose the
>> benefit of Eigen's cache-friendly matrix-vector product
>> implementation, which is important for large n.
>
> I guess Márton's suggestion would first and foremost reduce the read-cost of
> A:
> If A is stored row-major, each row could be multiplied by x, and the
> resulting scalar times that row transposed can be added to g. Assuming that
> g and at least one row of A fit into the cache this could reduce RAM access
> by a factor of 2.
>

I did exactly that and benchmarked it:

Scalar objfunc(const Matrix<Scalar, Dynamic, 1>&  x, Matrix<Scalar,
Dynamic, 1>&  g)
{
    Scalar f(0);
    g.resize(x.size());
    g.setZero();

    for (ptrdiff_t i=0; i<AT.cols(); ++i)
    {
        typename Matrix<Scalar, Dynamic, Dynamic>::ColXpr ai = AT.col(i);
        Scalar e = ai.dot(x) - b(i);
        g += e*ai;
        f += e*e;
    }
    return Scalar(0.5) * f;
}

Benchmarking reveals that regardless of the size of A (1024x1024,
1024x4096, 4096x1024, 4096x4096), the original code is faster.
Conclusion: don't try to outsmart Gael (at least if your matrices
don't fit into the cache)...

Marton



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/