Re: [eigen] Signed or unsigned indexing

On Fri, Jan 20, 2017 at 8:45 PM, François Fayard <fayard@xxxxxxxxxxxxx> wrote:

- Once the war of signed/unsigned comes to an end, there is also the problem of 32/64 bits integers on a 64-bit machine. If you have bandwidth problems, or vectorized code, the choice is obvious: use 32-bits indices. But what should you do in other cases? One has to know that when if p is a pointer and k is a 32-bit integer, to get the address of p[k] one need to convert first k to a 64-bit integer. So, you might think that using 64-bit indices is better on x86-64. I even found a very contrived benchmark where it is indeed more efficient. But for almost every code, it just does not make any difference.

I remember we observed a clear speedup when we moved from int to ptrdiff_t in Eigen. That was before we release 2..0, so with much older compilers (gcc 4.2) and older hardware (core2).

Actually, both questions are highly related because when you start mixing 32 and 64 bits integers, I have to admit that unsigned types win here because the conversion is a no op in this case, whereas signed types require a special treatment. In Eigen, this happens when using SparseMatrix or PermutationMatrix for which 32 bits signed integers are used by default so save memory usage. All these conversions are explicitly handled by a convert_index<To,From>(From x) helper function with assertion checking. To overcome the signed conversion overhead, we could add a convert_positive_index<To,From>(From x) variant asserting that x>0 and enforcing a cheap conversion (as in the unsigned case).

Then regarding Eigen moving to unsigned integers (or supporting unsigned integers, that's the same), that's not gonna happen because there are numerous places where we truly need signed integers, and as previously stated by others this would mean that for every use of the current Index type we would have to carefully think whether it should be signed or unsigned (considering possible future usages for which negative indices could make sense), and then be extremely careful for every operations (addition, comparison, assignment,etc) involving two Index types to be sure they are both signed or both unsigned. We have enough subtleties to take of.. Sorry.

Regarding the "extra bit" argument, it does make sense for the StorageIndex type when the number of non-zeros is in [2^31..2^32] on a 64 bits system. For instance, SparseMatrix<double,0,unsigned int> would save 1/4 of memory consumption compared to SparseMatrix<double,0,Index>. I'm sure this justify all the pain of supporting both signed and unsigned types as the StorageIndex.

gael