Re: [eigen] matrix::linspace |

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

*To*: eigen@xxxxxxxxxxxxxxxxxxx*Subject*: Re: [eigen] matrix::linspace*From*: Hauke Heibel <hauke.heibel@xxxxxxxxxxxxxx>*Date*: Sun, 24 Jan 2010 17:04:28 +0100*Dkim-signature*: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=Nwu/rwGFJ4Hiu8rDIZga7EH8Eqwei1V4RBxrURZYbRw=; b=vIWuSEO0AVo167PjUuIrFMWX4+pBktR8hCpepgJulFX86SopN/l42wzWtYYqtY1k5A 85PGwpKls9wfG6ap7++ZoJRQPNvNur6na6basg6RWzo9G/gOj/7OAfTplcb//EO32aFJ uKMIb1d2k3GXNCWh1FUxttM/nlb8JBqQ96QOo=*Domainkey-signature*: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=k/lC6KqHmM3WggxIUe6IxGAyjuzAYrd2b5vUccMTHxbLXXAr8kHjohd0No0RedHUeO Iv1dazHPHXQZsWERLJ2TxGCh949Sw6+uHSYvjPBS+5chLG6IJgccN13leMhnlj7mtSg5 mxiEQ4zMwqRGQqS480SIvqUT7rnRx76dpn644=

I tried using sequential packet access. Unfortunately, it did not help. Maybe I am still missing something. The required patch (indexed packet access) is attached to this mail as well as my test code. - Hauke On Sat, Jan 23, 2010 at 4:03 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote: > 2010/1/23 Hauke Heibel <hauke.heibel@xxxxxxxxxxxxxx>: >> Hi there, >> >> I am working on some functionality which would allow something like this: >> >> VectorXd a = VectorXd::linspace(low,high,num_steps) >> >> which should be corresponding to: >> >> const double step_size = (high-low)/(num_steps-1); >> VectorXd a(num_steps); >> for (int i=0; i<num_steps; ++i) >> a(i) = low+i*step_size; >> >> In Eigen, this is best implemented based on NullaryExpressions, i.e. >> Matrices, where each element can be computed based on some functor. > > I agree! > >> In >> theory, this should not only be nice from a usability point of view >> but I also hoped of gaining from vectorization. It tuns out, that this >> is not that easy - it maybe due to MSVC but maybe also due to my lack >> of SSE skills. For the most simple example >> >> for (int i=0; i<num_steps; ++i) a(i) = i; >> >> I managed to implement the functor's method ei_linspace_op::packetOp >> such that we do actually gain from vectorization as (note that I added >> integer parameters to it): >> >> // for doubles only >> ei_linspace_op::packetOp(int i, int = 0) >> { >> return ei_padd(ei_pset1<Scalar>(i),ei_pset<Scalar>(0,1)); >> } >> >> For me the first surprise was that this performs way better than just >> returning ei_pset<Scalar>(i,i+1). > > Notice that reading from memory is costly, and ei_pset1 is costly > because it needs a SHUFPD instruction, so naively I would think that > the best that can happen is that you make once and for all a packet > (0,1) and a packet (1,1) and at each step you only add the latter into > the former. Perhaps your code is fast because it allowed the compiler > to do that optimization? > >> Next, I looked into the case: >> >> for (int i=0; i<num_steps; ++i) a(i) = low+i >> >> and here, I totally failed. Returning >> >> return ei_pset<Scalar>(m_low+i,m_low+i+1); >> >> resulted in no statistically meaningful gain (~0.06%) speedup. The >> same holds for >> >> return ei_padd(ei_pset1<Scalar>(i),ei_padd(ei_pset1(m_low),ei_pset<Scalar>(0,1))); >> >> which is even sometimes slower and permuting the operators did not help either. >> >> So, I am sort of puzzled about what I could improve. Any ideas? Any >> SSE operation, I am not aware of for creating (a, a+1, a+2, ...) > > Same remark as above. I would initially store the packet (low,low+1) > in a SSE register, store (1,1) into another SSE register, and only do > a ei_padd at each iteration. Naively I would think that that is much > faster? > >> Finally, note that ei_pset is not yet in PacketMath. Over here, I >> added forward delarations into GenericPacketMath and the actual SSE >> implementation in the corresponding PacketMath header (no Altivec >> yet). They GenericPacketMath declarations (as the name says) have no >> implementations since I don't see any meaningful way of providing them >> nor do I think they are useful. > > Indeed I think they're only used in the matrix product code, which > doesn't use your new ei_pset function. > >> The second change in the lib was >> mentioned above - I added indices to the packetOp functions. > > Oh right, you need to fit your stuff with our packetOp infrastructure. > Hm, not easy indeed, as I was assuming sequential, not random access. > >> >> If everything regarding the vectorization fails, we could still >> provide the functionality (with a specialization for the trivial case >> where we gain from vectorizing) because at least, I think it is nifty. > > We might fail to vectorize random access efficiently. But 95% of use > cases will be sequential access. This could then be a use case for a > ReturnByValue function. When nesting this into a larger expression, > this would have the drawback of introducing a temporary, and the > advantage of enabling full vectorization instead of preventing > vectorization of the larger expression. > > Benoit > > >

**Attachment:
main.cpp**

**Attachment:
cwisenullary.patch**

**Follow-Ups**:**Re: [eigen] matrix::linspace***From:*Benoit Jacob

**References**:**[eigen] matrix::linspace***From:*Hauke Heibel

**Re: [eigen] matrix::linspace***From:*Benoit Jacob

**Messages sorted by:**[ date | thread ]- Prev by Date:
**Re: [eigen] FFT update** - Next by Date:
**Re: [eigen] FFT update** - Previous by thread:
**Re: [eigen] matrix::linspace** - Next by thread:
**Re: [eigen] matrix::linspace**

Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |