Re: [eigen] Signed or unsigned indexing |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
On 2017-01-21 22:56, Gael Guennebaud wrote:
That's easy to check, I simply measured:
EIGEN_DONT_INLINE
void foo(MatrixXf &A, MatrixXf &B, Index i, Index j, Index n, Index m)
{
A.block(i, j, n, m) += B.block(i+1, j+1, n, m);
}
and got 15% of speed difference between int and ptrdiff_t using either
gcc6 or clang3.9.
I have just tested this kind of code on a 32x32 matrix, and the version
using 64-bit integers is indeed 25% faster than the version using 32-bit
integers with Eigen.
But if I code the same thing in plain C++ (without Eigen), with for
loops,
I get exactly the same timings with 32-bit integers and 64-bit integers
with both the Intel compilers and clang. With gcc, the version with
32-bit
integers is 4% faster (see codes below).
My feeling is the fact that Eigen is written with intrinsics turns off
many
compiler optimizations. Therefore, the difference in between 32-bit and
64-bit integers is very Eigen-specific.
If there was such a difference in between 32-bit and 64-bit integers
indexing
in plain C/C++, nobody would index anymore with ints in C.
I measured the conversion of 1000 integers from 'int' or 'unsigned
int' to 'long' and observe 25% of speed difference using either clang
or gcc (in favour of unsigned int).
My claim was not that there was no difference in between converting int
and
unsigned to long, but rather that this conversion does not happen as
often as
you might think and almost never makes any difference. For instance, a
compiler
might change
void f(double* p, long n) {
for (int k = 0; k < n; ++k) {
p[k] += 1.0;
}
}
to (look at the new type for k)
void f(double* p, long n) {
for (long k = 0; k < n; ++k) {
p[k] += 1.0;
}
}
because overflow for signed integers is undefined behavior. This
transformation
would be impossible with unsigned integers because of warping behavior.
So signed
integers give more flexibility to the compiler. Even though the
conversion from
int to long is more costly than the conversion from unsigned to long,
the compiler
is often allowed to remove this conversion either completely or at least
from the
inner loop with signed integers.
For 32-bit/64-bit integers, there are some corner cases where one is
clearly better
than the other. This code which tries to find the index of the smallest
element of
an array:
int index_smallest(float* p, int n) {
float small_value = std::numeric_limits<float>::max();
int small_k = -1;
for (int k = 0; k < n; ++k) {
if (p[k] < small_value) {
small_value = p[k];
small_k = k;
}
}
return k;
}
The Intel compiler is smart enough to vectorize this loop (this is a
reduction). But
as both small_value and small_k have to be carried in vector registers,
one could
understand that it is more efficient if their type have the same size.
This is what
happens. So if your array is of float, use int for indexes. Using 64-bit
integers
might lead to a 2x slowdown. But if your array is made of doubles, it is
better to
use a 64-bit integer :-)
Francois
PS: Here are the codes used for my benchmark
==== Plain C++
#include <cstddef>
//#define INT std::ptrdiff_t
#define INT int
int main() {
const int nb_time = 10000000;
const INT n = 32;
float* A = new float[n * n];
float* B = new float[n * n];
for (INT j = 0; j < n; ++j) {
for (INT i = 0; i < n; ++i) {
A[j * n + i] = 0.0f;
B[j * n + i] = 0.0f;
}
}
for (int k = 0; k < nb_time; ++k) {
for (INT j = 0; j < n; ++j) {
for (INT i = 0; i < n; ++i) {
A[j * n + i] += B[j * n + i];
}
}
}
return 0;
}
====
==== Eigen
//#define EIGEN_DEFAULT_DENSE_INDEX_TYPE std::ptrdiff_t
#define EIGEN_DEFAULT_DENSE_INDEX_TYPE int
#include <Eigen/Dense>
int main() {
const int nb_time = 10000000;
const int n = 32;
Eigen::Matrix<float, Eigen::Dynamic, Eigen::Dynamic> A(n, n);
Eigen::Matrix<float, Eigen::Dynamic, Eigen::Dynamic> B(n, n);
for (int j = 0; j < n; ++j) {
for (int i = 0; i < n; ++i) {
A(i, j) = 0.0f;
B(i, j) = 0.0f;
}
}
for (int k = 0; k < nb_time; ++k) {
A.block(0, 0, n, n) += B.block(0, 0, n, n);
}
return 0;
}
====