Re: [eigen] How can I improve this function |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
On 22.09.2012 23:17, Gael Guennebaud wrote:
you could try:
x = (x.array()>0).select(xval, (y.array()>0).select(yval, x) );
Nota bene (only if you care for speed -- eventually this should be
handled by Eigen itself):
If you have SSE4.1, ignore NaNs, and replace >0 by >=0, the above
expression could actually be implemented just using two blendv
instructions and some loads/stores.
For example with T=float (not tested, no handling of remaining elements,
....):
void combine(Eigen::Matrix<float, Eigen::Dynamic, 1> & x, const float xval,
const Eigen::Matrix<float, Eigen::Dynamic, 1> & y, const float
yval) {
using Eigen::internal;
const size_t len = x.rows();
Packet4f xv = pset1(xval), yv = pset1(yval);
for (size_t idx = 0; idx <= len-4; idx += 4) {
Packet4f xx = x.packet<Aligned>(idx), yy=y.packet<Aligned>(idx);
Packet4f res = _mm_blendv_ps(xv, _mm_blendv_ps(yv, xx, yy), xx);
x.writePacket(idx, res);
}
// todo: handle remaining elements ...
}
Note that blendv checks only the upper bits of the last operand, so it
includes a free comparison for >=0.
Without SSE4.1 you can construct something using pcmp and some
bit-twiddling.
Christoph
--
----------------------------------------------
Dipl.-Inf. Christoph Hertzberg
Cartesium 0.049
Universität Bremen
Enrique-Schmidt-Straße 5
28359 Bremen
Tel: +49 (421) 218-64252
----------------------------------------------