Re: [eigen] generic unrollers |

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

*To*: eigen@xxxxxxxxxxxxxxxxxxx*Subject*: Re: [eigen] generic unrollers*From*: Benoît Jacob <jacob@xxxxxxxxxxxxxxx>*Date*: Tue, 17 Jun 2008 23:42:31 +0200

Hi, attached are my own measurements. Intel Core 1 (32bit), g++ 4.3.0, compiled with "-O3 -DNDEBUG", that is without vectorization, to make things more fun since you already measured with vectorization. As you can see, here, the overall winner is "<=" but not by a big margin. It looks like we have yet to find the definitive formula... Cheers, Benoit On Monday 16 June 2008 00:07:22 Gael Guennebaud wrote: > >> , for the a+b and 2*a cases I'll > >> write an exhaustive benchmark... If there is no obvious reason to eval > >> a+b for a 2x2 product then it might be better to not eval since this > >> allows the user to perform fine tuning for his specific case that is not > >> possible if we do (abusive?) evaluation. > > > > It's great if you do a benchmark, I don't see any other way of moving > > forward! > > here you go (see attached files). So M,N,K denotes the size of the > matrix product: > > MxN = MxK * KxN. > > I benchmarked both (a+b)*c and (2*a)*c, with 4 different conditions: > the current one with "<", the same with "<=", never evaluate, and > evaluate if N>1 (e.i. if a coeff is read at least twice). I compiled > with gcc-4.2, -O3 -DNDEBUG using float and vectorization enabled. > > So for this benchmark it is quite clear that, as expected, "<=" works > much better than the current "<". But surprisingly, N>1, which > implies the evaluation of (2*a) with N==2 works even slightly better ! > This is probably because the compiler can cache the temporaries into > the registers (I have a 64bits CPU, so 16 SSE registers). In that case > counting for the extra loads and stores is wrong. So we could try this > one: > > r*SC <= (r-1) * RC > > which basically means let's forget the extra store and evaluate even > if it does not look really better (equality). In practice this should > give better results (at least for gcc-4.2 with a lot of floating point > registers). > > Gael.

**Attachment:
EigenCostModel_benoit.ods**

**Attachment:
signature.asc**

**Follow-Ups**:**Re: [eigen] generic unrollers***From:*Gael Guennebaud

**References**:**[eigen] generic unrollers***From:*Benoît Jacob

**Re: [eigen] generic unrollers***From:*Benoît Jacob

**Re: [eigen] generic unrollers***From:*Gael Guennebaud

**Messages sorted by:**[ date | thread ]- Prev by Date:
**Re: [eigen] compile time statistics** - Next by Date:
**Re: [eigen] generic unrollers** - Previous by thread:
**Re: [eigen] generic unrollers** - Next by thread:
**Re: [eigen] generic unrollers**

Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |