[eigen] matrix::linspace

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Hi there,

I am working on some functionality which would allow something like this:

  VectorXd a = VectorXd::linspace(low,high,num_steps)

which should be corresponding to:

  const double step_size = (high-low)/(num_steps-1);
  VectorXd a(num_steps);
  for (int i=0; i<num_steps; ++i)
   a(i) = low+i*step_size;

In Eigen, this is best implemented based on NullaryExpressions, i.e.
Matrices, where each element can be computed based on some functor. In
theory, this should not only be nice from a usability point of view
but I also hoped of gaining from vectorization. It tuns out, that this
is not that easy - it maybe due to MSVC but maybe also due to my lack
of SSE skills. For the most simple example

  for (int i=0; i<num_steps; ++i) a(i) = i;

I managed to implement the functor's method ei_linspace_op::packetOp
such that we do actually gain from vectorization as (note that I added
integer parameters to it):

  // for doubles only
  ei_linspace_op::packetOp(int i, int = 0)
  {
    return ei_padd(ei_pset1<Scalar>(i),ei_pset<Scalar>(0,1));
  }

For me the first surprise was that this performs way better than just
returning ei_pset<Scalar>(i,i+1).

Next, I looked into the case:

  for (int i=0; i<num_steps; ++i) a(i) = low+i

and here, I totally failed. Returning

  return ei_pset<Scalar>(m_low+i,m_low+i+1);

resulted in no statistically meaningful gain (~0.06%) speedup. The
same holds for

  return ei_padd(ei_pset1<Scalar>(i),ei_padd(ei_pset1(m_low),ei_pset<Scalar>(0,1)));

which is even sometimes slower and permuting the operators did not help either.

So, I am sort of puzzled about what I could improve. Any ideas? Any
SSE operation, I am not aware of for creating (a, a+1, a+2, ...)

Finally, note that ei_pset is not yet in PacketMath. Over here, I
added forward delarations into GenericPacketMath and the actual SSE
implementation in the corresponding PacketMath header (no Altivec
yet). They GenericPacketMath declarations (as the name says) have no
implementations since I don't see any meaningful way of providing them
nor do I think they are useful. The second change in the lib was
mentioned above - I added indices to the packetOp functions.

If everything regarding the vectorization fails, we could still
provide the functionality (with a specialization for the trivial case
where we gain from vectorizing) because at least, I think it is nifty.

- Hauke



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/