[eigen] vectorization of loops using eigen arrays. |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
Hi
I'm was implementing distance calculations in python for prototyping and then
switched to C++ with eigen to speed up the calculations. To my surprise on the
test data the python program is faster by a factor of two then my C++ program.
Perf-stat tells me that neither python nor c++ have cache-misses but the eigen
implementation uses more then twice as much cycles, even though I have to load
the whole python interpreter. The data and sources are shown below.
In python I have used a brute-force C implementation for the pairwise distances
from scipy. The code looks pretty similar to my C code.
My guess is that I either have not chosen a good representation in memory of my
data or that the python version is making a lot more use of vectorization then
my C++ code does.
How can I check/force the vectorization of my eigen code?
best, Max
DATA and SOURCES
================
Cycles used up by the different programs
Eigen: 73,132,906,861 cycles
Blaze: 39,290,635,107 cycles
Python: 29,960,555,581 cycles
The code for the programs, test data and the complete results of the perf-stats
measurements can be found at.
https://gist.github.com/kain88-de/731d9a2791cece980cc4
gcc -v:
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-unknown-linux-gnu/4.9.2/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: /build/gcc/src/gcc-4.9.2/configure --prefix=/usr
--libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man
--infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/
--enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared
--enable-threads=posix --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch
--disable-libssp --enable-gnu-unique-object --enable-linker-build-id
--enable-cloog-backend=isl --disable-isl-version-check
--disable-cloog-version-check --enable-lto --enable-plugin
--enable-install-libiberty --with-linker-hash-style=gnu --disable-multilib
--disable-werror --enable-checking=release Thread model: posix gcc version
4.9.2 (GCC)
#include <cmath>
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
#include <Eigen/Core>
typedef Eigen::Array<double, Eigen::Dynamic, 3> Array;
typedef Eigen::Array<int, 3, 1> Vector;
std::size_t countModelsInPDB(const std::string& pdbfile) {
std::ifstream inFile(pdbfile);
std::string line;
std::size_t count = 0;
while(std::getline(inFile, line)) {
if (line.find("MODEL") != std::string::npos) {
++count;
}
}
return count;
}
std::size_t countAtomsInPDB(const std::string& pdbfile) {
std::size_t count = 0;
std::string line;
std::ifstream inFile(pdbfile);
bool doCount = false;
// count occurenes of ATOM between first MODEL and ENDMDL
while (std::getline(inFile, line)) {
if (line.find("MODEL") != std::string::npos) {
doCount = true;
}
if (doCount && line.find("ATOM") != std::string::npos) {
++count;
}
if (line.find("ENDMDL") != std::string::npos) {
break;
}
}
return count;
}
Array readCoordinates(std::ifstream& inFile, const std::size_t atomCount) {
std::string line;
Array xyz(atomCount, 3);
while (std::getline(inFile, line)) {
if (line.find("ENDMDL") != std::string::npos) {
break;
} else if (line.find("ATOM") != std::string::npos) {
std::istringstream iss(line);
std::string atom, atomType, aaType, wtf, index;
double x,y,z;
std::size_t atomIndex;
iss >> atom >> atomIndex >> atomType >> aaType >> wtf >> index >> x >> y >> z;
xyz(atomIndex - 1, 0) = x;
xyz(atomIndex - 1, 1) = y;
xyz(atomIndex - 1, 2) = z;
}
}
return xyz;
}
std::vector<Array> readModel(const std::string& pdbfile) {
std::size_t atomCount = countAtomsInPDB(pdbfile);
std::vector<Array> models;
std::string line;
std::ifstream inFile(pdbfile);
while (std::getline(inFile, line)) {
if (line.find("MODEL") != std::string::npos) {
models.push_back(readCoordinates(inFile, atomCount));
}
}
return models;
}
void writeDistanceMatrix(const Eigen::ArrayXXd& dist,
const std::string& outfilename) {
std::ofstream out(outfilename);
auto size = dist.rows();
for (int i=0; i<size; ++i) {
for (int j=0; j<size; ++j) {
out << i << "\t" << j << "\t" << dist(i, j) << std::endl;
}
}
}
double DRMS(const Array& A, const Array& B, const Vector& ps, const Vector& pe) {
double d = 0;
uint32_t np = 0;
std::size_t nrp = ps.size();
for (int m1=0; m1<nrp; ++m1) {
for (int m2=m1+1; m2<nrp; ++m2) {
for (int k1=ps[m1]; k1<pe[m1]+1; ++k1) {
for (int k2=ps[m2]; k2<pe[m2]+1; ++k2) {
double di2 = 0;
double dj2 = 0;
for (int l=0; l<3; ++l) {
di2 += pow(A(k1, l) - A(k2, l), 2);
dj2 += pow(B(k1, l) - B(k2, l), 2);
}
++np;
d += pow(sqrt(dj2) - sqrt(di2), 2);
}
}
}
}
return d;
}
int main(int argc, char* argv[]) {
Vector ps, pe;
ps << 0 , 148 , 198;
pe << 131 , 163 , 213;
auto strucs = readModel(argv[1]);
int Nstr = strucs.size();
Eigen::ArrayXXd dist(Nstr, Nstr);
// check if we read the pdb correct
// auto& tmp = strucs[0];
// for (int i=0; i<4; ++i) {
// std::cout << tmp(0, i) << " " << tmp(1,i) << " " << tmp(2, i) << std::endl;
// }
for (int i=0; i<Nstr ; ++i) {
dist(i, i) = 0;
for (int j=i+1; j<Nstr; ++j) {
dist(i, j) = DRMS(strucs[i], strucs[j], ps, pe);
dist(j, i) = dist(i, j);
// std::cout << "Distances (" << i << "," << j << ") = " << dist(i,j) << ", np=" << np <<std::endl;
}
}
writeDistanceMatrix(dist, argv[2]);
}