[eigen] Performance colwise matrix mult vs. raw loop |

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

*To*: eigen <eigen@xxxxxxxxxxxxxxxxxxx>*Subject*: [eigen] Performance colwise matrix mult vs. raw loop*From*: Norbert Wenzel <norbert.wenzel.lists@xxxxxxxxx>*Date*: Sat, 13 Jan 2018 13:20:11 +0100*Dkim-signature*: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=to:from:subject:message-id:date:user-agent:mime-version :content-language:content-transfer-encoding; bh=7iHbdfgmKSLfg/vBPHYJu6BT9A12YORFEKT1eN5wMks=; b=SXLTYT+DVPgZwaLIa4RrplpI7Hgg1NFMolLOKA/cBr83KTygpv4IMlRcMimFo4uVwx 0Ieg9gfpi+JG7TrHZNvcmYFGuhYa5tqAKMeqHA6Ar1RErPv8YskfNo83JBjumUxW5t+3 wuLLIUwzyRdBdgm+NCYCi8FhNJfQTAQa6fiUNvJ7AGWzLuKKsMkl42vMTgiSnRAuIn0R brZz5LMLyjcfzdvv6sR3jn6c1E9y6DJTTBFP1Cw6j13grAZlSUyQXcJep2Ga3xILIop6 5W4dX0/gPrHwi+ebD7XEcWeL80pkYT0AbRFg5Q5ATZ7NkwUoEJjwjMbUjOuxfmPn2l0h xX1A==

Hi, I'm using Eigen in a program that reads a list of (millions of) 3d points (in bulk), (rigidly) transforms these points and then processes the transformed points. During optimizing the program (on Linux) I found that I could gain a few percent runtime, by replacing a colwise matrix multiplication with a raw loop that (from my perspective) does essentially the same. Note that the runtime gain has been achieved for the whole program, not only for the transformation part alone. My first question would be if the assumption that the following two code fragments should do the same and therefore be comparable is correct: using pointlist = Eigen::Matrix<double, 3, Eigen::Dynamic, Eigen::RowMajor>; using transform = Eigen::Transform<double, 3, Eigen::Isometry>; extern pointlist points; extern const transform trans; void eigen_matrix_transformation() { points = (trans * points.colwise().homogeneous()); } void raw_loop_transformation() { for(auto c = points.cols(); c > 0; --c) { points.col(c-1) = trans * points.col(c-1).homogeneous(); } } Although the number of points in the matrix is not known at compile time the dimensionality of the points is, which should be enough to determine the sizes of any intermediate storage at compile time. I've extracted the code[0] from my program and found that the difference between these implementations is ~3x. When comparing the generated code[1] it seems that the colwise matrix multiplication allocates dynamic memory whereas the raw loop does not. (At least it may throw std::bad_alloc.) Note that I could reproduce these results on different Linux machines, but not on Windows. Although on my Windows laptop the overall benchmark ran faster in a Linux virtual machine than on the Windows host. I'm not sure if I'm using a sub-optimal Eigen function for my task or if the Geometry module (or at least this part of the module) is not that optimized, because there are more important parts in Eigen that needed optimization. So I'd like to hear from you if you can reproduce my benchmark results and/or consider this an issue. Is this maybe already known? I've currently replaced the colwise matrix multiplication with a raw loop in my code so this is not a pressing issue to me at all. I was just really surprised with the results I got. I'd like to hear your opinions about this benchmark. Thanks for your work on Eigen, Norbert [0] https://github.com/norbertwenzel/eigen-benchmark [1] https://godbolt.org/g/atG6uA

**Follow-Ups**:**Re: [eigen] Performance colwise matrix mult vs. raw loop***From:*Gael Guennebaud

**Messages sorted by:**[ date | thread ]- Prev by Date:
**Re: [eigen] alignment c++17** - Next by Date:
**Re: [eigen] Performance colwise matrix mult vs. raw loop** - Previous by thread:
**Re: [eigen] Aligned Fixed Sized Mapped matrices** - Next by thread:
**Re: [eigen] Performance colwise matrix mult vs. raw loop**

Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |