Re: [eigen] matrix::linspace |

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

*To*: eigen@xxxxxxxxxxxxxxxxxxx*Subject*: Re: [eigen] matrix::linspace*From*: Benoit Jacob <jacob.benoit.1@xxxxxxxxx>*Date*: Sat, 23 Jan 2010 10:03:51 -0500*Dkim-signature*: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=uTepoty1tzpotjtVj0t+U5S2ifYyvFm19qHSV1qRhk4=; b=Or60n2dBfibR0lXrDx+lptR9yO4p+Y/FFU0jYTdrRatJoK+cjaW717aFCrpP+P/LDT DnxgIbgbgOgYPC3myV/eaAcMYzsUMcChj9cLo3lwdXAV9iuXD29X9Cq/QbPo5zXiMBvX yyJVH/V4rOi0nA21JGS7zz+u4uFzCQbIzFbo8=*Domainkey-signature*: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=E9Rp9txNQrc0SEZgYaO+1A/mGeugazrTS5GILs+PB5YsUYxkcVL4JbmwiZ9NJmYYB3 DsgVCla/vobgHi2jgvlPGc6Eb6wGT7DbrxFrGya+KyQmGjYvYm9aFxCUhhalrrPf9eKZ suGwb+O9nrOpO6x/q/wA74+RE22J5V6Kym1RQ=

2010/1/23 Hauke Heibel <hauke.heibel@xxxxxxxxxxxxxx>: > Hi there, > > I am working on some functionality which would allow something like this: > > VectorXd a = VectorXd::linspace(low,high,num_steps) > > which should be corresponding to: > > const double step_size = (high-low)/(num_steps-1); > VectorXd a(num_steps); > for (int i=0; i<num_steps; ++i) > a(i) = low+i*step_size; > > In Eigen, this is best implemented based on NullaryExpressions, i.e. > Matrices, where each element can be computed based on some functor. I agree! > In > theory, this should not only be nice from a usability point of view > but I also hoped of gaining from vectorization. It tuns out, that this > is not that easy - it maybe due to MSVC but maybe also due to my lack > of SSE skills. For the most simple example > > for (int i=0; i<num_steps; ++i) a(i) = i; > > I managed to implement the functor's method ei_linspace_op::packetOp > such that we do actually gain from vectorization as (note that I added > integer parameters to it): > > // for doubles only > ei_linspace_op::packetOp(int i, int = 0) > { > return ei_padd(ei_pset1<Scalar>(i),ei_pset<Scalar>(0,1)); > } > > For me the first surprise was that this performs way better than just > returning ei_pset<Scalar>(i,i+1). Notice that reading from memory is costly, and ei_pset1 is costly because it needs a SHUFPD instruction, so naively I would think that the best that can happen is that you make once and for all a packet (0,1) and a packet (1,1) and at each step you only add the latter into the former. Perhaps your code is fast because it allowed the compiler to do that optimization? > Next, I looked into the case: > > for (int i=0; i<num_steps; ++i) a(i) = low+i > > and here, I totally failed. Returning > > return ei_pset<Scalar>(m_low+i,m_low+i+1); > > resulted in no statistically meaningful gain (~0.06%) speedup. The > same holds for > > return ei_padd(ei_pset1<Scalar>(i),ei_padd(ei_pset1(m_low),ei_pset<Scalar>(0,1))); > > which is even sometimes slower and permuting the operators did not help either. > > So, I am sort of puzzled about what I could improve. Any ideas? Any > SSE operation, I am not aware of for creating (a, a+1, a+2, ...) Same remark as above. I would initially store the packet (low,low+1) in a SSE register, store (1,1) into another SSE register, and only do a ei_padd at each iteration. Naively I would think that that is much faster? > Finally, note that ei_pset is not yet in PacketMath. Over here, I > added forward delarations into GenericPacketMath and the actual SSE > implementation in the corresponding PacketMath header (no Altivec > yet). They GenericPacketMath declarations (as the name says) have no > implementations since I don't see any meaningful way of providing them > nor do I think they are useful. Indeed I think they're only used in the matrix product code, which doesn't use your new ei_pset function. > The second change in the lib was > mentioned above - I added indices to the packetOp functions. Oh right, you need to fit your stuff with our packetOp infrastructure. Hm, not easy indeed, as I was assuming sequential, not random access. > > If everything regarding the vectorization fails, we could still > provide the functionality (with a specialization for the trivial case > where we gain from vectorizing) because at least, I think it is nifty. We might fail to vectorize random access efficiently. But 95% of use cases will be sequential access. This could then be a use case for a ReturnByValue function. When nesting this into a larger expression, this would have the drawback of introducing a temporary, and the advantage of enabling full vectorization instead of preventing vectorization of the larger expression. Benoit

**Follow-Ups**:**Re: [eigen] matrix::linspace***From:*Hauke Heibel

**References**:**[eigen] matrix::linspace***From:*Hauke Heibel

**Messages sorted by:**[ date | thread ]- Prev by Date:
**Re: [eigen] Clamping the coeffs of a matrix** - Next by Date:
**Re: [eigen] recommended reading for template metaprogramming** - Previous by thread:
**Re: [eigen] matrix::linspace** - Next by thread:
**Re: [eigen] matrix::linspace**

Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |