Re: [eigen] How can I improve this function

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


On 22.09.2012 23:17, Gael Guennebaud wrote:
  you could try:

x = (x.array()>0).select(xval, (y.array()>0).select(yval, x) );

Nota bene (only if you care for speed -- eventually this should be handled by Eigen itself): If you have SSE4.1, ignore NaNs, and replace >0 by >=0, the above expression could actually be implemented just using two blendv instructions and some loads/stores. For example with T=float (not tested, no handling of remaining elements, ....):

void combine(Eigen::Matrix<float, Eigen::Dynamic, 1> & x, const float xval,
const Eigen::Matrix<float, Eigen::Dynamic, 1> & y, const float yval) {
  using Eigen::internal;
  const size_t len = x.rows();
  Packet4f xv = pset1(xval), yv = pset1(yval);
  for (size_t idx = 0; idx <= len-4; idx += 4) {
    Packet4f xx = x.packet<Aligned>(idx), yy=y.packet<Aligned>(idx);
    Packet4f res = _mm_blendv_ps(xv, _mm_blendv_ps(yv, xx, yy), xx);
    x.writePacket(idx, res);
  }
  // todo: handle remaining elements ...
}

Note that blendv checks only the upper bits of the last operand, so it includes a free comparison for >=0. Without SSE4.1 you can construct something using pcmp and some bit-twiddling.


Christoph


--
----------------------------------------------
Dipl.-Inf. Christoph Hertzberg
Cartesium 0.049
Universität Bremen
Enrique-Schmidt-Straße 5
28359 Bremen

Tel: +49 (421) 218-64252
----------------------------------------------



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/