>
> I also tried
> return ei_log(a), but still no dice, similiar output from compiler.
>
> However, calling pmul is fine. (Just trying to understand)
> // Just for testing
> template<typename Packet>
> inline static Packet ei_plog10(Packet a)
> {
> return ei_pmul(a, a); // This compiles just fine and gives 'correct'
> answers
> }
>
> What am I doing wrong here? How is ei_plog different from ei_pmul that one
> is callable and one isn't?
>
> Thanks for any insight. Sorry to be wasting all your time with such a simple
> thing.
>
> -Trevor
>
> On 8 June 2010 06:03, Benoit Jacob <
jacob.benoit.1@xxxxxxxxx> wrote:
>>
>> 2010/6/8 Benoit Jacob <
jacob.benoit.1@xxxxxxxxx>:
>> > 2010/6/8 Trevor Irons <
trevorirons@xxxxxxxxx>:
>> >> OK, I made the changes you suggested/requested. Non SSE is still
>> >> calling
>> >> std::log10. Is this what you wanted?
>> >
>> > Yes, this is good.
>> >
>> > But I have a couple of remarks:
>> >
>> > +/** \internal \returns the log10 of \a a (coeff-wise) */
>> > +template<typename Scalar, typename Packet> inline static Packet
>> > ei_plog10(Packet a)
>> > +{ return ei_pmul(ei_pset1(Scalar(1)/log(Scalar(10))), ei_plog(a)); }
>> > +
>> >
>> > This function should take only the template parameter Packet. Indeed
>> > it's going to be called as
>> >
>> > ei_plog10(somepacket)
>> >
>> > and automatic template param determination will only work if it only
>> > takes a Packet templ param. That means that you need to be able to
>> > determine the Scalar type from the Packet type. For this, do:
>> >
>> > typedef typename ei_unpacket_traits<Packet>::type Scalar;
>> >
>> > also, the caching of the constant value needs to go here. That may
>> > mean that you have to do it yourself, without the macro. Fortunately,
>> > it's easy: this macro was doing:
>> >
>> > #define _EIGEN_DECLARE_CONST_Packet4f(NAME,X) \
>> > const Packet4f ei_p4f_##NAME = ei_pset1<float>(X)
>>
>> Question, especially to Gael: do you confirm that this is enough to
>> get caching? Naively I would have thought that 'static' is needed
>> there. But with compiler optimizations, one never knows!
>>
>> Benoit
>>
>> >
>> > so just do yourself:
>> >
>> > const Packet myconstant = ei_pset1(blabla);
>> >
>> > (again, if blabla has type Scalar, ei_pset1<Scalar> will be selected.)
>> >
>> > Then, this:
>> >
>> > +static EIGEN_DONT_INLINE EIGEN_UNUSED Packet4f ei_plog10(Packet4f x)
>> > +{
>> > + _EIGEN_DECLARE_CONST_Packet4f( invlog_0p1, 4.342944819032518721E-1f);
>> > + return ei_pmul(ei_p4f_invlog_0p1, ei_plog(x));
>> > +}
>> > +
>> >
>> > is not needed. My point was that this can all be done in a totally
>> > generic way, not specific to SSE, not specific to float. You don't
>> > have to change any file under SSE/. Your log10 function will be
>> > vectorized on all platforms and for all types where we vectorize log.
>> > That matters, because soon enough we'll probably vectorize it on other
>> > vectorization platforms such as NEON!
>> >
>> > The rest looks fine to me.
>> >
>> >>
>> >> I removed the HasLog10 and added the cost of 1 multiply to the cost in
>> >> Fuctors.h to reflect the multipy. Is this accurate?
>> >
>> > Oh yes it is accurate... more accurate than was needed! These costs
>> > only need to be very rough. One could then discuss whether in the
>> > non-vectorized case this is accurate since log10 could be implemented
>> > to be as fast as log (with a custom approximant). But your formula is
>> > at least accurate in the vectorized case :)
>> >
>> >>
>> >> Thanks for holding my hand on this.
>> >
>> > Thanks for the good work,
>> > Benoit
>> >
>> >>
>> >> -Trevor
>> >>
>> >>
>> >> On 7 June 2010 18:06, Benoit Jacob <
jacob.benoit.1@xxxxxxxxx> wrote:
>> >>>
>> >>> 2010/6/7 Benoit Jacob <
jacob.benoit.1@xxxxxxxxx>:
>> >>> > ok so time for another answer :)
>> >>> >
>> >>> > The only thing that I really don't want is for us to add
>> >>> > SSE/AltiVec/whatever code for log10. We must reuse the existing code
>> >>> > for log there.
>> >>> >
>> >>> > Then there are two ways:
>> >>> > a) the way of your patch: add a new custom functor for log10
>> >>> > b) the way i suggested: just let log10 return a product _expression_.
>> >>> >
>> >>> > Your way a) has the advantage of sparing one multiplication (at
>> >>> > least
>> >>> > in the non vectorized case), but b) means less extra code.
>> >>> >
>> >>> > Now Hauke raises a good point, that I overlooked: even with a) you
>> >>> > can
>> >>> > still reuse our vectorization code for log10. So actually, I'm OK
>> >>> > with
>> >>> > your approach, provided that we do that. You can do it generically
>> >>> > for
>> >>> > all types at once (Hauke showed the way for float but there's no
>> >>> > reason to special-case it): edit Core/GenericPacketMath.h and
>> >>> > implement ei_plog10 there like the other functions, returning
>> >>> > ei_pmul(
>> >>> > ei_pset1(log(Scalar(0.1))),
>> >>>
>> >>> woooooops!!
>> >>>
>> >>> replace log(Scalar(0.1)) by Scalar(1) / log(Scalar(10))
>> >>>
>> >>> and sorry Hauke, I saw too late that you already wrote that. Though I
>> >>> guess there's value in insisting that it's faster to multiply by a
>> >>> constant, than to divide by a constant.
>> >>>
>> >>> Benoit
>> >>>
>> >>> > ei_plog(x)
>> >>> > )
>> >>> > The next problem is to cache the result of
>> >>> > ei_pset1(log(Scalar(0.1))),
>> >>> > probably a static constant, maybe have a look at what's done in
>> >>> > ei_plog itself in SSE/. If that sounds obscure, leave that for
>> >>> > later.
>> >>> >
>> >>> > In any case, we don't need a new HasLog10. Since we'll be using
>> >>> > ei_plog, all we need is the existing HasLog.
>> >>> >
>> >>> > Benoit
>> >>> >
>> >>> > 2010/6/7 Trevor Irons <
trevorirons@xxxxxxxxx>:
>> >>> >> OK, thanks. I'm on a really old machine and can't check sse here.
>> >>> >> I'll
>> >>> >> test
>> >>> >> your code and generate another patch. Unless I hear that is is
>> >>> >> unwanted.
>> >>> >>
>> >>> >> (This just goes in
>> >>> >> eigenNes/Eigen/src/Core/arch/SSE/MathFunctions.h)
>> >>> >> right?
>> >>> >>
>> >>> >> -Trevor
>> >>> >>
>> >>> >> On 7 June 2010 15:46, Hauke Heibel <
hauke.heibel@xxxxxxxxxxxxxx>
>> >>> >> wrote:
>> >>> >>>
>> >>> >>> Let me try to answer for Benoit. :)
>> >>> >>>
>> >>> >>> On Mon, Jun 7, 2010 at 11:20 PM, Trevor Irons
>> >>> >>> <
trevorirons@xxxxxxxxx>
>> >>> >>> wrote:
>> >>> >>> > Sure, I can do this. I prefer store a minimum of constants. But
>> >>> >>> > if
>> >>> >>> > it
>> >>> >>> > will
>> >>> >>> > be faster, then sure.
>> >>> >>> >
>> >>> >>> > Why won't calls to std::log10 be vectorized? No sse
>> >>> >>> > instructions?
>> >>> >>>
>> >>> >>> This is because std::log is not right away available as an SSE
>> >>> >>> intrinsic and we have a special version based on the code from
>> >>> >>> here:
>> >>> >>>
http://gruntthepeon.free.fr/ssemath/
>> >>> >>>
>> >>> >>> Nonetheless I think your patch is the right way to go. We could
>> >>> >>> achieve easily what Benoit suggests by offering a version of
>> >>> >>> ei_plog10
>> >>> >>> such as
>> >>> >>>
>> >>> >>> Packet4f ei_plog10(Packet4f x)
>> >>> >>> {
>> >>> >>> const Packet4f ei_p4f_log10 = ei_plog(ei_pset1(10.0f));
>> >>> >>> return ei_pdiv(ei_plog(x), ei_p4f_log10);
>> >>> >>> }
>> >>> >>>
>> >>> >>> which uses "Packet4f ei_plog(Packet4f x)". The constant would
>> >>> >>> probably
>> >>> >>> be taken care of by a decent compiler, i..e. I don't think we need
>> >>> >>> manual caching.
>> >>> >>>
>> >>> >>> > Also, is there any desire to have log10 defined your way in
>> >>> >>> > Eigen,
>> >>> >>> > for
>> >>> >>> > convenience? Or are you just suggesting I do this locally?
>> >>> >>>
>> >>> >>> I am pretty sure he meant that it would be nice to really have
>> >>> >>> this in
>> >>> >>> Eigen and not just in a local copy... though I am wildly guessing.
>> >>> >>>
>> >>> >>> - Hauke
>> >>> >>>
>> >>> >>> p.s. I am pretty tired an hope that I picked the right conversion
>> >>> >>> factor
>> >>> >>> ...
>> >>> >>>
>> >>> >>>
>> >>> >>
>> >>> >>
>> >>> >
>> >>>
>> >>>
>> >>
>> >>
>> >
>>
>>
>
>