Re: [eigen] first time usage by me! and very satisfied!

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Hi Ben,

ok, here I agree you you. In particular if the statistics stuffs grow
in the future, it is clearly better to well separate those features
from the core. This is also consistent with the rest of Eigen. For
instance, currently, several "array" related features are grouped in
the Cwise pseudo expression. And similar to rest of Eigen, I still
want to add an entry point from any matrix expressions:

(a+b).stats().standardDeviation();

The reason is that's not always trivial to write the type of an
expression (the upcoming "auto" keywords will solve this general
issue, but too late for us).

One last thing: from my point of view mean() is much more common than
any other statistics functions, and so I not sure this one belongs to
the statistics module. Indeed, having to include the statistics module
just to have the mean function which is nothing else than:

mat.sum()/(mat.cols()*mat.rows())

looks a bit overkill. So I would keep this one in MatrixBase just like
sum(), minCoeff() etc...

anyways, thanks for your time and helpful comments.

gael.

On Sat, Sep 13, 2008 at 10:06 PM, Schleimer, Ben <bensch128@xxxxxxxxx> wrote:
> Hi Gael,
>  I see your point about caching. Maybe it's just better to have the user to the caching on a case-by-case basis. My main concern then is the tight coupling between matrix types and statistics means. There doesn't need to be that way. I really want a mixin mechanism to exist in C++ so the user can choose to include the statistics methods or not (on a case-by-case basis)
>
> Maybe this construction would work:
> template <typename Expression>
> class Statistics : public Expression // should this be a VectorType?
> {
>  // Standard VectorType constructors, maybe there needs to be
>  // default c'tors and init methods
>
>  Statistics<Expression> colwise() const
>  {
>    return Statistics<Expression>(VectorType::colwise());
>  }
>
>  Statistics<Expression> rowwise() const
>  {
>    return Statistics<Expression>(VectorType::rowwise());
>  }
>
>
>  Expression mean() const
>  {
>    // return an expression for the mean
>  }
>
>  Expression standardDeviation() const
>  {
>    // Return an expression for the standard deviation
>  }
>
> }
>
> Alternatively, it might be better to have c-style functions which take in Expressions of vectors and return expressions which evaluate to the mean/variance/stddeviation/etc...
>
> Cheers
> Ben
>
>
>
> --- On Sat, 9/13/08, Gael Guennebaud <gael.guennebaud@xxxxxxxxx> wrote:
>
>> From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
>> Subject: Re: [eigen] first time usage by me! and very satisfied!
>> To: eigen@xxxxxxxxxxxxxxxxxxx
>> Date: Saturday, September 13, 2008, 9:23 AM
>> On Sat, Sep 13, 2008 at 1:27 AM, Schleimer, Ben
>> <bensch128@xxxxxxxxx> wrote:
>> > Sorry about that. I haven't actually worked with
>> Eigen2 yet. Work is mostly on my mind :( (I have a story
>> about two very badly implemented Matrix libraries (in the
>> name of optimization) which is why I'm eager to see
>> eigen2 be well written.)
>>
>> hehe we all have the same expectations here :)
>>
>> so what you propose is basically what I imagined with an
>> additional
>> feature alowing to remove the cache mechanism (and storage)
>> at compile
>> time. So, for instance,  v.stats() would typically return a
>> Statistics
>> object without cache that is good. However, I'm still
>> not  fan of this
>> approach .
>>
>> Indeed, my main concern is how to combine this approach
>> with column
>> and row wise stuff ?
>> For your information all reduction methods  are also
>> available in a
>> column and row wise flavor, for instance:
>>
>> vec = mat.colwise().sum();
>> vec = mat.colwise().minCoeff();
>> etc...
>>
>> where "mat.colwise().minCoeff()" actualy returns
>> an expression.
>>
>> So let's pick an example. With my initial proposal you
>> could write:
>>
>> Vector means = mat.colwise().mean();
>> Vector stddevs = mat.colwise().standardDeviation(means);
>> for (....)
>> {
>>    // use means and stddevs as many times as you want
>> }
>>
>> here you only allocate what is really needed, and there is
>> zero
>> overhead. Of course the user might forget to pass the
>> precomputed mean
>> to standardDeviation:
>>
>> Vector means = mat.colwise().mean();
>> Vector stddevs = mat.colwise().standardDeviation();
>>
>> in which case the means are computed twice. IMO, I think
>> this is the
>> unique little drawback of this approach. So here comes the
>> caching
>> mechanism.
>>
>> On the other hand with the cache enabled you would either
>> have to
>> allocate all the data that might be cached and then the
>> Statistics
>> object might become very large. Or you can rely on dynamic
>> allocations
>> but you still have to store the pointers and have to pay
>> for the
>> dynamic allocation cost. Of course you could also have
>> compile time
>> bit flags to tell which data you wanna cache, but that
>> sounds rather
>> tedious for the user. Another argument is that anytime you
>> want to
>> access to a cached value you still have to pay for an extra
>> "if".
>> Furthermore the user have to take care to access the matrix
>> data
>> through the Statistics object such that the cache is
>> automatically
>> invalidated...
>>
>> So all in all I don't think the little advantage
>> offered by this
>> approach overcome its drawbacks.
>>
>> any counter-arguments ?
>>
>> gael.
>>
>> >>
>> >> ah ! after answering the rest of the comments I
>> now see how
>> >> it could
>> >> be done. maybe you were thinking  to something
>> like that:
>> >>
>> >> VectorType v;
>> >> // play with v..
>> >> // now you want some statistics on v:
>> >>
>> >> Statistics<VectorType> s(v);
>> >>
>> >> s.mean();
>> >> s.variance();
>> >> etc.
>> >> etc.
>> >>
>> >> // MatrixBase would define:
>> >> // Statistics<Derived> stats() const {
>> return
>> >> derived(); }
>> >> // so that you can also do:
>> >>
>> >> v.stats().mean();
>> >>
>> >
>> > Actually I was thinking of:
>> > template <typename Scalar>
>> > struct DirtyCacher
>> > {
>> >  bool dirty() const { return _dirty; }
>> >  void setDirty(bool f) { _dirty = f; }
>> >  Scalar mean(Statistics* obj) const
>> >  {
>> >    if(dirty())
>> >    {
>> >      _mean = obj->meanInternal();
>> >      _dirty = false;
>> >    }
>> >    return _mean;
>> >  }
>> >
>> >  bool _dirty;
>> >  Scalar _mean;
>> > }
>> >
>> > template <typename Scalar>
>> > struct NoCacher
>> > {
>> >  bool dirty() const { return true; }
>> >  void setDirty(bool f) { /*nothing*/ }
>> >  Scalar mean(Statistics* obj) const
>> >  {
>> >    return obj->meanInternal();
>> >  }
>> > }
>> >
>> > template <typename VectorType, typename Cacher =
>> DirtyCacher<VectorType::Scalar> >
>> > class Statistics
>> > {
>> >  Statistics(VectorType v)
>> >  : _v(v) {}
>> >  VectorType::Scalar mean() const
>> >  {
>> >    return cacher.mean(this);
>> >  }
>> >  const VectorType& value() const
>> >  {
>> >    // always assume this isn't changed
>> >    return _v;
>> >  }
>> >  VectorType& value()
>> >  {
>> >    cacher.setDirty(true); // always assume the data is
>> changed
>> >    return _v;
>> >  }
>> >  ...
>> > protected:
>> >  Cacher cacher;
>> >  friend struct DirtyCacher;
>> >  friend struct NoCacher;
>> >  VectorType::Scalar meanInternal()
>> >  {
>> >    // always calculate the mean here
>> >    Scalar ret = 0;
>> >    for(size_t i=0; i<_v.size(); ++i)
>> >    {
>> >      ret += _v[i];
>> >    }
>> >    return ret / _v.size();
>> >  }
>> > }
>> >
>> > Basically, if the user chooses to use the DirtyCacher,
>> he should get a mean() lookup speed increase but he'll
>> loose 1+sizeof(Scalar) bytes. If he uses NoCacher, he gets
>> the space back but with a lookup speed loss. If there are
>> more items to cache, then we can use a bitfield to minimize
>> the space loss (eg. use meanDirty, varienceDirty,
>> sortDirty).
>> > Also the other nice thing about this design is that
>> VectorType doesn't need to know anything about the
>> existance of the Statistic class.
>> > The returning the value as non-const maybe could be
>> sped up if VectorType can indicate that an operation has
>> modified it or not. This is just the most straight forward
>> thing i could think of.
>> >
>> >>
>> >> actually the pb is that you don't have access
>> to
>> >> "*this"... but you
>> >> can use the result of any function as default
>> argument
>> >> value, for
>> >> instance:
>> >>
>> >> float foo(float v = random()) {...}
>> >>
>> >> is ok.
>> >
>> > Ahh, i did not know that... interesting..
>> >
>> > cheers
>> > Ben
>> >
>> >
>> >
>> >
>> >>
>> >> >> I prefer that to something like:
>> >> >> Scalar stddev(Scalar*  returnedMean = 0)
>> const;
>> >> >> because in case you have already computed
>> the mean
>> >> value
>> >> >> you cannot
>> >> >> tell stddev to use it...
>> >> >
>> >> > This is confusing. what does the function
>> return if
>> >> returnedMean == 0?
>> >> > It might be better to cache the mean and not
>> return it
>> >> in the stddev() method. The user has access to it
>> in mean(),
>> >> no?
>> >>
>> >>  I think I was not clear here. I meant: "I
>> prefer the
>> >> above approach
>> >> rather than the following common solution".
>> >> and to be clear, its implementation would be:
>> >> Scalar stddev(Scalar*  returnedMean = 0) const
>> >> {
>> >>    Scalar m = mean();
>> >>    if (returnedMean) *returnedMean = m;
>> >>    // compute and returns stddev
>> >> }
>> >>
>> >> but again I don't like this approach !!
>> >>
>> >> >>
>> >> >> so to summarize we would add:
>> >> >>
>> >> >> * mean()
>> >> >>
>> >> >> * standardDeviation([precomputed_mean])
>> >> >> yeah, eventually
>> "standardDeviation" is
>> >> not too
>> >> >> long
>> >> >>
>> >> >> * variance([precomputed_mean])
>> >> >>
>> >> >> * median()
>> >> >> maybe here we could have an optional
>> argument to
>> >> tell if
>> >> >> the vector is
>> >> >> already sorted ??
>> >> >
>> >> > We could have a cached sorted flag...
>> >>
>> >> same pb as above
>> >>
>> >>
>> >> gael.
>> >>
>> >> >>
>> >> >> * sort()
>> >> >> median needs a sort algo, and here I
>> really mean
>> >> >> "sort", not "sorted",
>> >> >> so it an in-place sort
>> >> >>
>> >> >
>> >> >
>> >> > Cheers
>> >> > Ben
>> >> >
>> >> >
>> >> >>
>> >> >> cheers,
>> >> >> gael.
>> >> >>
>> >> >> On Thu, Sep 11, 2008 at 8:29 PM, Andre
>> Krause
>> >> >> <post@xxxxxxxxxxxxxxxx> wrote:
>> >> >> > dear list, just wanted to let you
>> all know i
>> >> am using
>> >> >> eigen2 alpha 07 for the
>> >> >> > first time - and i am very
>> satisfied. works
>> >> like
>> >> >> charm, with no problems on
>> >> >> > win32 and visualc++ 2008.
>> >> >> >
>> >> >> > though i am missing some basic
>> convenience
>> >> functions
>> >> >> like mean, medium, stdev
>> >> >> > etc. . i already heard on #eigen,
>> that they
>> >> are
>> >> >> planned for some future release.
>> >> >> >
>> >> >> > kind regards,
>> >> >> >        andre
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >
>> >> >
>> >> >
>> >
>> >
>> >
>
>
>



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/