Re: [eigen] Advanced vectorization |

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

*To*: eigen@xxxxxxxxxxxxxxxxxxx*Subject*: Re: [eigen] Advanced vectorization*From*: Márton Danóczy <marton78@xxxxxxxxx>*Date*: Mon, 6 Jun 2011 20:44:38 +0200*Dkim-signature*: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type:content-transfer-encoding; bh=sN/t47evYQ+FFNKhJHTCXabH5EkDNsz06N/TeBZs4gU=; b=Oa2jaib5Kqc1o/L+Qh6BsztcjHYx9i35gpoCHuXLx+W0nW0HZIh+oIIaNo9bRbZIC0 MeerJ+Ft4k8J0Q7KTnxTg65l/BWnpuzKGEdw73dLjp09IfqF/kO6UhVWOzFTEWA3vMfD G/On7HbjElom3lhe9WjLzQq4KGot4w4LAhr+Q=*Domainkey-signature*: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=YIrB6j5M2Q1UywRIOeSlozCG0yBGynzVy2snfg0iIfFVxx5pzZTk7NY1FiN6x8sInZ 29qYvle2Onta+15Hj1cr4qtM7ydiXagTz/EqgGY6+nMMehsAII5Rc0P35LiNH/n+vapP Auf9PqKO404iEJJi7QndNzwwPmkbGMk+Egn8Y=

>> If A is of size n*n, then just reading it is n^2 memory accesses, >> while you trick would be saving only n memory accesses (since e is a >> vector of size n), so I'd say it's going to be negligible unless n is >> very small. Also, by computing e coefficient-wise, you lose the >> benefit of Eigen's cache-friendly matrix-vector product >> implementation, which is important for large n. > > I guess Márton's suggestion would first and foremost reduce the read-cost of > A: > If A is stored row-major, each row could be multiplied by x, and the > resulting scalar times that row transposed can be added to g. Assuming that > g and at least one row of A fit into the cache this could reduce RAM access > by a factor of 2. > I did exactly that and benchmarked it: Scalar objfunc(const Matrix<Scalar, Dynamic, 1>& x, Matrix<Scalar, Dynamic, 1>& g) { Scalar f(0); g.resize(x.size()); g.setZero(); for (ptrdiff_t i=0; i<AT.cols(); ++i) { typename Matrix<Scalar, Dynamic, Dynamic>::ColXpr ai = AT.col(i); Scalar e = ai.dot(x) - b(i); g += e*ai; f += e*e; } return Scalar(0.5) * f; } Benchmarking reveals that regardless of the size of A (1024x1024, 1024x4096, 4096x1024, 4096x4096), the original code is faster. Conclusion: don't try to outsmart Gael (at least if your matrices don't fit into the cache)... Marton

**References**:**[eigen] Advanced vectorization***From:*Márton Danóczy

**Re: [eigen] Advanced vectorization***From:*Benoit Jacob

**Re: [eigen] Advanced vectorization***From:*Christoph Hertzberg

**Messages sorted by:**[ date | thread ]- Prev by Date:
**Re: [eigen] Advanced vectorization** - Next by Date:
**[eigen] Custom complex type** - Previous by thread:
**Re: [eigen] Advanced vectorization** - Next by thread:
**[eigen] Custom complex type**

Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |