Re: [eigen] Overflow in sum()

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


On Thu, Apr 2, 2009 at 9:06 PM, Patrick Mihelich
<patrick.mihelich@xxxxxxxxx> wrote:
> Well, I remember now, in our case we actually avoid the overflow issue by
> quantizing our feature vector elements to only 4 bits(!), then we can add
> many of them together as uint8's without fear. One use case we have is
> taking the average of about 50 176-dimensional uint8 vectors, with uint8
> vector as the result. With uint8 SIMD specializations Eigen might already
> match our hand-written code for this, I would be very interested to see.
>
> We also need very efficient L1 distance between two such vectors, returning
> uint as the accumulation type. Is there no MatrixBase::abs() function? It
> would be cool to write something like
>
> uint distance = (v1 - v2).abs().sum(); // maybe insert cast<uint>()
> somewhere

you can already do that, the correct syntax is:

uint distance = (v1 - v2).cwise().abs().sum();

or, you can also do:

distance = (v1-v2).lpNorm<1>();

> and have it compile to use _mm_sad_epu8 and _mm_add_epi16 to compute the
> sum-of-absolute-differences. This seems like it requires some tricky
> expression-template specialization. I can send example code but this is
> getting a little off-topic, maybe should be a different thread.

this is something that could be easily doable via the
"(v1-v2).lpNorm<1>()"  syntax. Catching the other one would be too
tricky.

> It would be nice to have a transparent way of controlling accumulation type
> and policy like Benoit's proposal, but maybe it's not sufficiently clear yet
> what typical use cases are... if my understanding is correct, the cast()
> approach should be adequate for my purposes. How does cast() influence SIMD
> optimizations? And probably there should be a prominent warning to only use
> scalar types like uint8 if you know what you're doing :).

indeed, it seems to me very very difficult to handle via a generic API
all common use of uint8 because they all are very very specific and
optimal perf requires very tricky ASM. For instance an efficient
implementation of vec_of_uint8.cast<uint>().sum() would be something
like:

__m128i r0=0, r1=0, r2=0, r3=0;
for each block of 16 uint8
   __m128i b = load block;
   unpack b to 4 __m128i registers b0 to b3;
   r0 += b1;
   ....
   r3 += b3;
end for
r0 += r1 + r2 + r3;
reduce r0 to a single int32

however, this is something we cannot do in a generic way (well
everything is doable, but that would requires a very very complex
mechanism).

I suggest that people who are interested in good uint8 support list
there typical use cases, and then we can see what is doable.

also, an alternative solution to control the return type of
sum/dot/etc is to add a templated version of those functions:

vec.sum<uint>();

cheers,
gael.

> Cheers,
> Patrick
>
> On Thu, Apr 2, 2009 at 7:04 AM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
> wrote:
>>
>> The idea was also to use it in matrix products, dot product, etc.
>> But then I have doubts: if really all these operations overflow then
>> maybe the user really needs to use a bigger scalar type in the first
>> place. There isn't much of a point in using uint8 if you need to cast
>> everytime you do something nontrivial... i'd need to see how this is
>> used in practice.
>>
>> Benoit
>>
>> 2009/4/2 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>> > I have to say I'm not sure to fully understand your proposal. Where
>> > would you use the additional scalar type and ei_saturate ? only in
>> > sum() ? to tweak the return type of operator+ and so ?
>> >
>> >
>> > On Thu, Apr 2, 2009 at 1:56 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
>> > wrote:
>> >> I agree, that for Jens' use case, the cast() approach works well; now
>> >> what do the dense-matrix-of-uint8 guys think? If they need to cast all
>> >> the time then I guess this isn't convenient. Putting that into the
>> >> NumTraits had the advantage of being very extensible without making
>> >> the rest of Eigen significantly more complex. But we'll do it only if
>> >> people need it...
>> >>
>> >> Benoit
>> >>
>> >> 2009/4/2 Jens Mueller <jens.k.mueller@xxxxxx>:
>> >>> Oh yes. You are right. For me cast<>() is best. And even for the
>> >>> others,
>> >>> I guess. Just cast the values such that there will be no overflows.
>> >>>
>> >>> Regards,
>> >>> Jens
>> >>>
>> >>> Gael Guennebaud wrote:
>> >>>> what about:
>> >>>>
>> >>>> mat.cast<uint>().sum()
>> >>>>
>> >>>> (this already works without creating any temporary matrix)
>> >>>>
>> >>>> gael.
>> >>>>
>> >>>> On Thu, Apr 2, 2009 at 11:03 AM, Jens Mueller <jens.k.mueller@xxxxxx>
>> >>>> wrote:
>> >>>> > The first proprosed plugin solution is acceptable for me. Since
>> >>>> > this
>> >>>> > plugin feature is neat anyway.
>> >>>> > If this overflow thing really matters for users, then extending
>> >>>> > NumTraits is reasonable, since people with custom matrices write
>> >>>> > their
>> >>>> > own NumTraits declarations anyway. So at the moment there are three
>> >>>> > users bothered with this problem. Is this considered enough?
>> >>>> >
>> >>>> > Regards,
>> >>>> > Jens
>> >>>> >
>> >>>> > Benoit Jacob wrote:
>> >>>> >> Thanks for your input; here's a proposal.
>> >>>> >>
>> >>>> >> First of all, what I really don't want is to add another template
>> >>>> >> parameter to Matrix and to add more enums/typedefs in xpr classes.
>> >>>> >> So
>> >>>> >> I'm trying to put all the stuff related to that issue in the
>> >>>> >> numeric
>> >>>> >> type itself. This way, only people who use that feature, pay for
>> >>>> >> it.
>> >>>> >>
>> >>>> >> So here's the proposal. in NumTraits<T> we just add another
>> >>>> >> typedef
>> >>>> >> for the accumulation type, and we define a function
>> >>>> >> ei_saturate<T>(T
>> >>>> >> t) for all possible accumulation types T. For all basic types
>> >>>> >> (including uint8) we define the accumulation type to be just T and
>> >>>> >> ei_saturate(t) returns just t, so the default for integer types is
>> >>>> >> the
>> >>>> >> clipping behavior. But then we can define (or let the user define)
>> >>>> >> more types, like a type that is a wrapper around uint8 with the
>> >>>> >> accumulation type set to uint32 and ei_saturate(t) actually
>> >>>> >> saturating
>> >>>> >> to 255, etc. The user can then use a matrix with that type as
>> >>>> >> scalar
>> >>>> >> type. If for a different matrix he wants a different accum type,
>> >>>> >> he
>> >>>> >> just uses a different scalar type with a different type as
>> >>>> >> NumTraits<T>::AccumType. In fact, it's even OK to let the
>> >>>> >> AccumType be
>> >>>> >> a template parameter of the wrapper-around-uint8 scalar type.
>> >>>> >>
>> >>>> >> Cheers,
>> >>>> >> Benoit
>> >>>> >>
>> >>>> >> 2009/4/1 Christian Mayer <mail@xxxxxxxxxxxxxxxxx>:
>> >>>> >> > -----BEGIN PGP SIGNED MESSAGE-----
>> >>>> >> > Hash: SHA256
>> >>>> >> >
>> >>>> >> > Benoit Jacob schrieb:
>> >>>> >> >> ah that's yet another different problem...
>> >>>> >> >>
>> >>>> >> >> with sparse matrices, the overflow problem arises only with a
>> >>>> >> >> few
>> >>>> >> >> operations like sum().
>> >>>> >> >>
>> >>>> >> >> But with what you say, dense matrices of uint8, the overflow
>> >>>> >> >> can
>> >>>> >> >> happen with a lot more operations: for example, any matrix
>> >>>> >> >> product
>> >>>> >> >> will typically overflow! So I guess that a matrix product of
>> >>>> >> >> unit8
>> >>>> >> >> matrices should return a matrix with a bigger scalar type...
>> >>>> >> >
>> >>>> >> > I'm working in the area of embedded software development where
>> >>>> >> > huge
>> >>>> >> > amounts of numerical software has to be dealt with in real time.
>> >>>> >> > As some
>> >>>> >> > of that code is quite old it's mostly written in fixed point,
>> >>>> >> > i.e. it's
>> >>>> >> > using normal integer operations (the variables usually have to
>> >>>> >> > be
>> >>>> >> > multiplied by a factor and an offset has to be added to get the
>> >>>> >> > "real
>> >>>> >> > world meaning" of each variable).
>> >>>> >> >
>> >>>> >> > In this context lots of programmers are looking for data types
>> >>>> >> > and
>> >>>> >> > possible overflows just to handle that problem, as each
>> >>>> >> > multiplication
>> >>>> >> > and addition could cause a problem.
>> >>>> >> >
>> >>>> >> > And there are usually two solutions: saturate or clip (i.e. just
>> >>>> >> > use the
>> >>>> >> > modulo value).
>> >>>> >> >
>> >>>> >> > => that's too far for eigen to handle
>> >>>> >> >
>> >>>> >> > But it might be a nice solution to offer (as a template
>> >>>> >> > parameter) a
>> >>>> >> > different data type as an result, so that the eigen user can
>> >>>> >> > handle this
>> >>>> >> > requirement each time it could appear in the best way for his
>> >>>> >> > algorithm.
>> >>>> >> > (And if possible we could also offer an saturating and an
>> >>>> >> > clipping
>> >>>> >> > version of the algorithm)
>> >>>> >> >
>> >>>> >> > CU,
>> >>>> >> > Chris
>> >>>> >> >
>> >>>> >> > -----BEGIN PGP SIGNATURE-----
>> >>>> >> > Version: GnuPG v1.4.9 (GNU/Linux)
>> >>>> >> >
>> >>>> >> > iEYEAREIAAYFAknTwKMACgkQoWM1JLkHou3scwCfSlLGepOwfZHX7/2a+X7de/MC
>> >>>> >> > eZcAn2a0doVjFW4/M8witEbO0Op8HWPI
>> >>>> >> > =eZE/
>> >>>> >> > -----END PGP SIGNATURE-----
>> >>>> >> >
>> >>>> >> >
>> >>>> >> >
>> >>>> >>
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >
>> >
>> >
>>
>>
>
>



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/