Re: [eigen] Signed or unsigned indexing

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


On 2017-01-21 22:56, Gael Guennebaud wrote:
That's easy to check, I simply measured:

EIGEN_DONT_INLINE
void foo(MatrixXf &A, MatrixXf &B, Index i, Index j, Index n, Index m)
{
  A.block(i, j, n, m) += B.block(i+1, j+1, n, m);
}

and got 15% of speed difference between int and ptrdiff_t using either
gcc6 or clang3.9.

I have just tested this kind of code on a 32x32 matrix, and the version
using 64-bit integers is indeed 25% faster than the version using 32-bit
integers with Eigen.

But if I code the same thing in plain C++ (without Eigen), with for loops,
I get exactly the same timings with 32-bit integers and 64-bit integers
with both the Intel compilers and clang. With gcc, the version with 32-bit
integers is 4% faster (see codes below).

My feeling is the fact that Eigen is written with intrinsics turns off many
compiler optimizations. Therefore, the difference in between 32-bit and
64-bit integers is very Eigen-specific.
If there was such a difference in between 32-bit and 64-bit integers indexing
in plain C/C++, nobody would index anymore with ints in C.

I measured the conversion of 1000 integers from 'int' or 'unsigned
int' to 'long' and observe 25% of speed difference using either clang
or gcc (in favour of unsigned int).

My claim was not that there was no difference in between converting int and unsigned to long, but rather that this conversion does not happen as often as you might think and almost never makes any difference. For instance, a compiler
might change

void f(double* p, long n) {
  for (int k = 0; k < n; ++k) {
    p[k] += 1.0;
  }
}

to (look at the new type for k)

void f(double* p, long n) {
  for (long k = 0; k < n; ++k) {
    p[k] += 1.0;
  }
}

because overflow for signed integers is undefined behavior. This transformation would be impossible with unsigned integers because of warping behavior. So signed integers give more flexibility to the compiler. Even though the conversion from int to long is more costly than the conversion from unsigned to long, the compiler is often allowed to remove this conversion either completely or at least from the
inner loop with signed integers.

For 32-bit/64-bit integers, there are some corner cases where one is clearly better than the other. This code which tries to find the index of the smallest element of
an array:

int index_smallest(float* p, int n) {
  float small_value = std::numeric_limits<float>::max();
  int small_k = -1;
  for (int k = 0; k < n; ++k) {
    if (p[k] < small_value) {
      small_value = p[k];
      small_k = k;
    }
  }
  return k;
}

The Intel compiler is smart enough to vectorize this loop (this is a reduction). But as both small_value and small_k have to be carried in vector registers, one could understand that it is more efficient if their type have the same size. This is what happens. So if your array is of float, use int for indexes. Using 64-bit integers might lead to a 2x slowdown. But if your array is made of doubles, it is better to
use a 64-bit integer :-)

Francois


PS: Here are the codes used for my benchmark
==== Plain C++
#include <cstddef>

//#define INT std::ptrdiff_t
#define INT int

int main() {
  const int nb_time = 10000000;

  const INT n = 32;
  float* A = new float[n * n];
  float* B = new float[n * n];
  for (INT j = 0; j < n; ++j) {
    for (INT i = 0; i < n; ++i) {
      A[j * n + i] = 0.0f;
      B[j * n + i] = 0.0f;
    }
  }

  for (int k = 0; k < nb_time; ++k) {
    for (INT j = 0; j < n; ++j) {
      for (INT i = 0; i < n; ++i) {
        A[j * n + i] += B[j * n + i];
      }
    }
  }

  return 0;
}
====

==== Eigen
//#define EIGEN_DEFAULT_DENSE_INDEX_TYPE std::ptrdiff_t
#define EIGEN_DEFAULT_DENSE_INDEX_TYPE int

#include <Eigen/Dense>

int main() {
  const int nb_time = 10000000;

  const int n = 32;
  Eigen::Matrix<float, Eigen::Dynamic, Eigen::Dynamic> A(n, n);
  Eigen::Matrix<float, Eigen::Dynamic, Eigen::Dynamic> B(n, n);
  for (int j = 0; j < n; ++j) {
    for (int i = 0; i < n; ++i) {
      A(i, j) = 0.0f;
      B(i, j) = 0.0f;
    }
  }

  for (int k = 0; k < nb_time; ++k) {
    A.block(0, 0, n, n) += B.block(0, 0, n, n);
  }

  return 0;
}
====



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/