Re: [eigen] Advanced vectorization

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

To: eigen@xxxxxxxxxxxxxxxxxxx
Subject: Re: [eigen] Advanced vectorization
From: Márton Danóczy <marton78@xxxxxxxxx>
Date: Mon, 6 Jun 2011 20:44:38 +0200
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type:content-transfer-encoding; bh=sN/t47evYQ+FFNKhJHTCXabH5EkDNsz06N/TeBZs4gU=; b=Oa2jaib5Kqc1o/L+Qh6BsztcjHYx9i35gpoCHuXLx+W0nW0HZIh+oIIaNo9bRbZIC0 MeerJ+Ft4k8J0Q7KTnxTg65l/BWnpuzKGEdw73dLjp09IfqF/kO6UhVWOzFTEWA3vMfD G/On7HbjElom3lhe9WjLzQq4KGot4w4LAhr+Q=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=YIrB6j5M2Q1UywRIOeSlozCG0yBGynzVy2snfg0iIfFVxx5pzZTk7NY1FiN6x8sInZ 29qYvle2Onta+15Hj1cr4qtM7ydiXagTz/EqgGY6+nMMehsAII5Rc0P35LiNH/n+vapP Auf9PqKO404iEJJi7QndNzwwPmkbGMk+Egn8Y=

>> If A is of size n*n, then just reading it is n^2 memory accesses,
>> while you trick would be saving only n memory accesses (since e is a
>> vector of size n), so I'd say it's going to be negligible unless n is
>> very small. Also, by computing e coefficient-wise, you lose the
>> benefit of Eigen's cache-friendly matrix-vector product
>> implementation, which is important for large n.
>
> I guess Márton's suggestion would first and foremost reduce the read-cost of
> A:
> If A is stored row-major, each row could be multiplied by x, and the
> resulting scalar times that row transposed can be added to g. Assuming that
> g and at least one row of A fit into the cache this could reduce RAM access
> by a factor of 2.
>

I did exactly that and benchmarked it:

Scalar objfunc(const Matrix<Scalar, Dynamic, 1>&  x, Matrix<Scalar,
Dynamic, 1>&  g)
{
    Scalar f(0);
    g.resize(x.size());
    g.setZero();

    for (ptrdiff_t i=0; i<AT.cols(); ++i)
    {
        typename Matrix<Scalar, Dynamic, Dynamic>::ColXpr ai = AT.col(i);
        Scalar e = ai.dot(x) - b(i);
        g += e*ai;
        f += e*e;
    }
    return Scalar(0.5) * f;
}

Benchmarking reveals that regardless of the size of A (1024x1024,
1024x4096, 4096x1024, 4096x4096), the original code is faster.
Conclusion: don't try to outsmart Gael (at least if your matrices
don't fit into the cache)...

Marton

References:
- [eigen] Advanced vectorization
  - From: Márton Danóczy
- Re: [eigen] Advanced vectorization
  - From: Benoit Jacob
- Re: [eigen] Advanced vectorization
  - From: Christoph Hertzberg

Messages sorted by: [ date | thread ]
Prev by Date: Re: [eigen] Advanced vectorization
Next by Date: [eigen] Custom complex type
Previous by thread: Re: [eigen] Advanced vectorization
Next by thread: [eigen] Custom complex type

Mail converted by MHonArc 2.6.19+

http://listengine.tuxfamily.org/