Re: [eigen] Re: small sums: vectorization not worth it |

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

*To*: eigen@xxxxxxxxxxxxxxxxxxx*Subject*: Re: [eigen] Re: small sums: vectorization not worth it*From*: "Benoit Jacob" <jacob.benoit.1@xxxxxxxxx>*Date*: Sat, 17 Jan 2009 14:38:06 +0100*Dkim-signature*: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=kqhprWwnrk81tszqK44JWjx9q0AbsTSDf+JQD/F0x98=; b=ifNViVjY+pITwE3ENULDhti7xph+U4IfnHJh0t2owo5dS5EHf894uvYXRGCdFzTs74 oIctD8Fqe/71I67tdyf6g17TX84meH9N4R9xAU5NM6bNk/Uo40IMEenxtnypeBDmiEU8 Q01TqpSvWTOOVMgjXAGBHvxClnBGB0baHojwQ=*Domainkey-signature*: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=FLFYWTLFmnsj9JI45Z+uhdhCOcvoSJwud83MBRD4eViUh+bQ7M3ucr7Qm7F3y+wxRp lW+7Z3br25P1/ri+JlM/L0jzmOUUbd3ddvlumb4Py3vfi86xbxvKhu/UfBn1jPvNfG5u vetrQGOltHRUJwLGZFct6PAD9NO6aDlQXu4q0=

2009/1/17 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>: > hm for a vector of size 64 your result does not make sense. Actually I > checked the generated assembly and your benchmark have several issues > such that you are not really benchmarking sum. First, the first > expression (v = one + v * 1e-10) was not properly inlined when the > vectorization was enabled. Second, this first expression is more > costly than sum. Third, you call twice sum on (almost) the same data, > and so perhaps the compiler manage to remove most of the computation > of the second call to sum only when the vectorization in disabled. > > So I rewrited the benchmark, see attached file, and now I get a slight > speed up for double/64 and more than x2 for float/64. The generated > assembly is pretty good in both cases, so why does the vectorization > not lead to higher speed up ? > > Actually, the non vectorized meta-unroller of sum is much clever than > the vectorized one because it reduces the dependency between the > instructions using a recursive divide and conquer strategy while the > vectorized one simply accumulates the coeff in a single register. I didn't realize it was so important. So should we have a similar strategy in the product innervec ? > Another possible issue was that the vectorized unroller loop over the > coeff in the wrong order (d,c,b,a instead of a,b,c,d). > > Anyway, I rewritted the vectorized meta unroller to use the same good > strategy, and now I get a significant speed up for double/64: x1.7 > faster ! and for float/64 I get x2.75, not bad. Wow, thanks a lot! > > For very small sizes, it is clear that at least for Vector2d this does > not make sense to vectorize it. For float, let's check. > > However, the reason why I did that change recently is that if we don't > vectorize Vector4f::sum() then we might lost some other vectorization, > ex: > > (0.5*a+0.25*b+0.25*c).sum(); > > If we disable vectorization of sum for small sizes, then what we have > to do in Sum.h, is to automatically insert an evaluation: > > (0.5*a+0.25*b+0.25*c).eval().sum(); > > Should be easy to do. I don't understand. Wouldn't it be easy, and much better, to add a more intelligent heuristic based on the expression's cost and size, the packet size, the cost of adding scalars, and perhaps on ei_cost_of_predux<Scalar>::ret that we may need to introduce? So with your example, this sum would still be vectorized because the xpr is costly enough. Benoit

**References**:**[eigen] small sums: vectorization not worth it***From:*Benoit Jacob

**[eigen] Re: small sums: vectorization not worth it***From:*Benoit Jacob

**[eigen] Re: small sums: vectorization not worth it***From:*Benoit Jacob

**[eigen] Re: small sums: vectorization not worth it***From:*Benoit Jacob

**Re: [eigen] Re: small sums: vectorization not worth it***From:*Gael Guennebaud

**Messages sorted by:**[ date | thread ]- Prev by Date:
**Re: [eigen] Snippet for select** - Next by Date:
**[eigen] Fix for small Map bug.** - Previous by thread:
**Re: [eigen] Re: small sums: vectorization not worth it** - Next by thread:
**[eigen] act 2....**

Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |