Re: [eigen] log10 support?

On 8 June 2010 20:25, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:

2010/6/8 Trevor Irons <trevorirons@xxxxxxxxx>:

> Thanks, but I seem to be incapable here.
>
> When I get rid of the float case I can't compile the most basic generic
> implimentation.
>
> For example:
> template<typename Packet>
> inline static Packet ei_plog10(Packet a)
> {
> //typedef typename ei_unpacket_traits<Packet>::type Scalar;
> //return ei_pmul(ei_pset1(Scalar(10)/ei_log(Scalar(10))), ei_plog(a));

Note that the first 10 above should be 1 :) Of course it's not what is
causing your current problems.

> return ei_plog(a); // wrong, testing
> }
>
> Doesn't compile because ei_plog(a) is calling std::log it seems.
> gcc error:
> /eigen/Eigen/src/Core/MathFunctions.h:573:22: error: no matching function
> for call to ‘log(const __vector(4) float&)’

OK so here's some explanation. Your code above is (rightly) calling
ei_plog(packet). See in GenericPacketMath.h the generic definition of
that function:

/** \internal \returns the log of \a a (coeff-wise) */
template<typename Packet> inline static Packet ei_plog(Packet a) {
return ei_log(a); }

So the generic version is to just call ei_log, our wrapper for the
standard std::log function that only makes sense for scalars, not for
packets.

Thus, given, that it actually calls this function, it's not surprising
that it's failing to compile. So are we crazy to have suggested you to
write this code? No!! Because, ta-daaaah, we are overriding the
generic ei_plog by a vectorized one in arch/SSE/MathFunctions.h. (We
are only overriding it for the Scalar type 'float', not other types
such as 'double', and we only do that on SSE, not other platforms. To
signify that we have a working ei_plog for packets, we have the HasLog
enum in ei_packet_traits<T>).

Your compile error means that the compiler fails to find our ei_plog
specialization in arch/SSE/. Here's how this function is defined
there:

static EIGEN_DONT_INLINE EIGEN_UNUSED Packet4f ei_plog(Packet4f x)
{

What strikes me there is that it's not defined as a template
specialization of the ei_plog template. The second thing that strikes
me is that, while the functions in GenericPacketMath.h take const
Packet& arguments, this one takes a Packet argument by value. I think
that both of these things need to be fixed (and can easily explain the
trouble you're having). So, in arch/SSE/MathFunctions.h, can you try
replacing the above line by

template<>

static EIGEN_DONT_INLINE EIGEN_UNUSED Packet4f

ei_plog<Packet4f>(const Packet4f& x)
{

If that works, please apply the same therapy to the other functions in
that file (I'd like us to do that anyway unless someone objects?)

Otherwise please post your current patch

Cheers,
Benoit

>
> I also tried
> return ei_log(a), but still no dice, similiar output from compiler.
>
> However, calling pmul is fine. (Just trying to understand)
> // Just for testing
> template<typename Packet>
> inline static Packet ei_plog10(Packet a)
> {
> return ei_pmul(a, a); // This compiles just fine and gives 'correct'
> answers
> }
>
> What am I doing wrong here? How is ei_plog different from ei_pmul that one
> is callable and one isn't?
>
> Thanks for any insight. Sorry to be wasting all your time with such a simple
> thing.
>
> -Trevor
>
> On 8 June 2010 06:03, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
>>
>> 2010/6/8 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
>> > 2010/6/8 Trevor Irons <trevorirons@xxxxxxxxx>:
>> >> OK, I made the changes you suggested/requested. Non SSE is still
>> >> calling
>> >> std::log10. Is this what you wanted?
>> >
>> > Yes, this is good.
>> >
>> > But I have a couple of remarks:
>> >
>> > +/** \internal \returns the log10 of \a a (coeff-wise) */
>> > +template<typename Scalar, typename Packet> inline static Packet
>> > ei_plog10(Packet a)
>> > +{ return ei_pmul(ei_pset1(Scalar(1)/log(Scalar(10))), ei_plog(a)); }
>> > +
>> >
>> > This function should take only the template parameter Packet. Indeed
>> > it's going to be called as
>> >
>> > ei_plog10(somepacket)
>> >
>> > and automatic template param determination will only work if it only
>> > takes a Packet templ param. That means that you need to be able to
>> > determine the Scalar type from the Packet type. For this, do:
>> >
>> > typedef typename ei_unpacket_traits<Packet>::type Scalar;
>> >
>> > also, the caching of the constant value needs to go here. That may
>> > mean that you have to do it yourself, without the macro. Fortunately,
>> > it's easy: this macro was doing:
>> >
>> > #define _EIGEN_DECLARE_CONST_Packet4f(NAME,X) \
>> > const Packet4f ei_p4f_##NAME = ei_pset1<float>(X)
>>
>> Question, especially to Gael: do you confirm that this is enough to
>> get caching? Naively I would have thought that 'static' is needed
>> there. But with compiler optimizations, one never knows!
>>
>> Benoit
>>
>> >
>> > so just do yourself:
>> >
>> > const Packet myconstant = ei_pset1(blabla);
>> >
>> > (again, if blabla has type Scalar, ei_pset1<Scalar> will be selected.)
>> >
>> > Then, this:
>> >
>> > +static EIGEN_DONT_INLINE EIGEN_UNUSED Packet4f ei_plog10(Packet4f x)
>> > +{
>> > + _EIGEN_DECLARE_CONST_Packet4f( invlog_0p1, 4.342944819032518721E-1f);
>> > + return ei_pmul(ei_p4f_invlog_0p1, ei_plog(x));
>> > +}
>> > +
>> >
>> > is not needed. My point was that this can all be done in a totally
>> > generic way, not specific to SSE, not specific to float. You don't
>> > have to change any file under SSE/. Your log10 function will be
>> > vectorized on all platforms and for all types where we vectorize log.
>> > That matters, because soon enough we'll probably vectorize it on other
>> > vectorization platforms such as NEON!
>> >
>> > The rest looks fine to me.
>> >
>> >>
>> >> I removed the HasLog10 and added the cost of 1 multiply to the cost in
>> >> Fuctors.h to reflect the multipy. Is this accurate?
>> >
>> > Oh yes it is accurate... more accurate than was needed! These costs
>> > only need to be very rough. One could then discuss whether in the
>> > non-vectorized case this is accurate since log10 could be implemented
>> > to be as fast as log (with a custom approximant). But your formula is
>> > at least accurate in the vectorized case :)
>> >
>> >>
>> >> Thanks for holding my hand on this.
>> >
>> > Thanks for the good work,
>> > Benoit
>> >
>> >>
>> >> -Trevor
>> >>
>> >>
>> >> On 7 June 2010 18:06, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
>> >>>
>> >>> 2010/6/7 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
>> >>> > ok so time for another answer :)
>> >>> >
>> >>> > The only thing that I really don't want is for us to add
>> >>> > SSE/AltiVec/whatever code for log10. We must reuse the existing code
>> >>> > for log there.
>> >>> >
>> >>> > Then there are two ways:
>> >>> > a) the way of your patch: add a new custom functor for log10
>> >>> > b) the way i suggested: just let log10 return a product _expression_.
>> >>> >
>> >>> > Your way a) has the advantage of sparing one multiplication (at
>> >>> > least
>> >>> > in the non vectorized case), but b) means less extra code.
>> >>> >
>> >>> > Now Hauke raises a good point, that I overlooked: even with a) you
>> >>> > can
>> >>> > still reuse our vectorization code for log10. So actually, I'm OK
>> >>> > with
>> >>> > your approach, provided that we do that. You can do it generically
>> >>> > for
>> >>> > all types at once (Hauke showed the way for float but there's no
>> >>> > reason to special-case it): edit Core/GenericPacketMath.h and
>> >>> > implement ei_plog10 there like the other functions, returning
>> >>> > ei_pmul(
>> >>> > ei_pset1(log(Scalar(0.1))),
>> >>>
>> >>> woooooops!!
>> >>>
>> >>> replace log(Scalar(0.1)) by Scalar(1) / log(Scalar(10))
>> >>>
>> >>> and sorry Hauke, I saw too late that you already wrote that. Though I
>> >>> guess there's value in insisting that it's faster to multiply by a
>> >>> constant, than to divide by a constant.
>> >>>
>> >>> Benoit
>> >>>
>> >>> > ei_plog(x)
>> >>> > )
>> >>> > The next problem is to cache the result of
>> >>> > ei_pset1(log(Scalar(0.1))),
>> >>> > probably a static constant, maybe have a look at what's done in
>> >>> > ei_plog itself in SSE/. If that sounds obscure, leave that for
>> >>> > later.
>> >>> >
>> >>> > In any case, we don't need a new HasLog10. Since we'll be using
>> >>> > ei_plog, all we need is the existing HasLog.
>> >>> >
>> >>> > Benoit
>> >>> >
>> >>> > 2010/6/7 Trevor Irons <trevorirons@xxxxxxxxx>:
>> >>> >> OK, thanks. I'm on a really old machine and can't check sse here.
>> >>> >> I'll
>> >>> >> test
>> >>> >> your code and generate another patch. Unless I hear that is is
>> >>> >> unwanted.
>> >>> >>
>> >>> >> (This just goes in
>> >>> >> eigenNes/Eigen/src/Core/arch/SSE/MathFunctions.h)
>> >>> >> right?
>> >>> >>
>> >>> >> -Trevor
>> >>> >>
>> >>> >> On 7 June 2010 15:46, Hauke Heibel <hauke.heibel@xxxxxxxxxxxxxx>
>> >>> >> wrote:
>> >>> >>>
>> >>> >>> Let me try to answer for Benoit. :)
>> >>> >>>
>> >>> >>> On Mon, Jun 7, 2010 at 11:20 PM, Trevor Irons
>> >>> >>> <trevorirons@xxxxxxxxx>
>> >>> >>> wrote:
>> >>> >>> > Sure, I can do this. I prefer store a minimum of constants. But
>> >>> >>> > if
>> >>> >>> > it
>> >>> >>> > will
>> >>> >>> > be faster, then sure.
>> >>> >>> >
>> >>> >>> > Why won't calls to std::log10 be vectorized? No sse
>> >>> >>> > instructions?
>> >>> >>>
>> >>> >>> This is because std::log is not right away available as an SSE
>> >>> >>> intrinsic and we have a special version based on the code from
>> >>> >>> here:
>> >>> >>> http://gruntthepeon.free.fr/ssemath/
>> >>> >>>
>> >>> >>> Nonetheless I think your patch is the right way to go. We could
>> >>> >>> achieve easily what Benoit suggests by offering a version of
>> >>> >>> ei_plog10
>> >>> >>> such as
>> >>> >>>
>> >>> >>> Packet4f ei_plog10(Packet4f x)
>> >>> >>> {
>> >>> >>> const Packet4f ei_p4f_log10 = ei_plog(ei_pset1(10.0f));
>> >>> >>> return ei_pdiv(ei_plog(x), ei_p4f_log10);
>> >>> >>> }
>> >>> >>>
>> >>> >>> which uses "Packet4f ei_plog(Packet4f x)". The constant would
>> >>> >>> probably
>> >>> >>> be taken care of by a decent compiler, i..e. I don't think we need
>> >>> >>> manual caching.
>> >>> >>>
>> >>> >>> > Also, is there any desire to have log10 defined your way in
>> >>> >>> > Eigen,
>> >>> >>> > for
>> >>> >>> > convenience? Or are you just suggesting I do this locally?
>> >>> >>>
>> >>> >>> I am pretty sure he meant that it would be nice to really have
>> >>> >>> this in
>> >>> >>> Eigen and not just in a local copy... though I am wildly guessing.
>> >>> >>>
>> >>> >>> - Hauke
>> >>> >>>
>> >>> >>> p.s. I am pretty tired an hope that I picked the right conversion
>> >>> >>> factor
>> >>> >>> ...
>> >>> >>>
>> >>> >>>
>> >>> >>
>> >>> >>
>> >>> >
>> >>>
>> >>>
>> >>
>> >>
>> >
>>
>>
>
>