Re: [eigen] first time usage by me! and very satisfied!

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Hi Gael,
  I see your point about caching. Maybe it's just better to have the user to the caching on a case-by-case basis. My main concern then is the tight coupling between matrix types and statistics means. There doesn't need to be that way. I really want a mixin mechanism to exist in C++ so the user can choose to include the statistics methods or not (on a case-by-case basis)

Maybe this construction would work:
template <typename Expression>
class Statistics : public Expression // should this be a VectorType? 
{
  // Standard VectorType constructors, maybe there needs to be 
  // default c'tors and init methods

  Statistics<Expression> colwise() const
  {
    return Statistics<Expression>(VectorType::colwise());
  }

  Statistics<Expression> rowwise() const
  {
    return Statistics<Expression>(VectorType::rowwise());
  }
  

  Expression mean() const
  {
    // return an expression for the mean
  }

  Expression standardDeviation() const
  {
    // Return an expression for the standard deviation
  }

}

Alternatively, it might be better to have c-style functions which take in Expressions of vectors and return expressions which evaluate to the mean/variance/stddeviation/etc...

Cheers
Ben



--- On Sat, 9/13/08, Gael Guennebaud <gael.guennebaud@xxxxxxxxx> wrote:

> From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
> Subject: Re: [eigen] first time usage by me! and very satisfied!
> To: eigen@xxxxxxxxxxxxxxxxxxx
> Date: Saturday, September 13, 2008, 9:23 AM
> On Sat, Sep 13, 2008 at 1:27 AM, Schleimer, Ben
> <bensch128@xxxxxxxxx> wrote:
> > Sorry about that. I haven't actually worked with
> Eigen2 yet. Work is mostly on my mind :( (I have a story
> about two very badly implemented Matrix libraries (in the
> name of optimization) which is why I'm eager to see
> eigen2 be well written.)
> 
> hehe we all have the same expectations here :)
> 
> so what you propose is basically what I imagined with an
> additional
> feature alowing to remove the cache mechanism (and storage)
> at compile
> time. So, for instance,  v.stats() would typically return a
> Statistics
> object without cache that is good. However, I'm still
> not  fan of this
> approach .
> 
> Indeed, my main concern is how to combine this approach
> with column
> and row wise stuff ?
> For your information all reduction methods  are also
> available in a
> column and row wise flavor, for instance:
> 
> vec = mat.colwise().sum();
> vec = mat.colwise().minCoeff();
> etc...
> 
> where "mat.colwise().minCoeff()" actualy returns
> an expression.
> 
> So let's pick an example. With my initial proposal you
> could write:
> 
> Vector means = mat.colwise().mean();
> Vector stddevs = mat.colwise().standardDeviation(means);
> for (....)
> {
>    // use means and stddevs as many times as you want
> }
> 
> here you only allocate what is really needed, and there is
> zero
> overhead. Of course the user might forget to pass the
> precomputed mean
> to standardDeviation:
> 
> Vector means = mat.colwise().mean();
> Vector stddevs = mat.colwise().standardDeviation();
> 
> in which case the means are computed twice. IMO, I think
> this is the
> unique little drawback of this approach. So here comes the
> caching
> mechanism.
> 
> On the other hand with the cache enabled you would either
> have to
> allocate all the data that might be cached and then the
> Statistics
> object might become very large. Or you can rely on dynamic
> allocations
> but you still have to store the pointers and have to pay
> for the
> dynamic allocation cost. Of course you could also have
> compile time
> bit flags to tell which data you wanna cache, but that
> sounds rather
> tedious for the user. Another argument is that anytime you
> want to
> access to a cached value you still have to pay for an extra
> "if".
> Furthermore the user have to take care to access the matrix
> data
> through the Statistics object such that the cache is
> automatically
> invalidated...
> 
> So all in all I don't think the little advantage
> offered by this
> approach overcome its drawbacks.
> 
> any counter-arguments ?
> 
> gael.
> 
> >>
> >> ah ! after answering the rest of the comments I
> now see how
> >> it could
> >> be done. maybe you were thinking  to something
> like that:
> >>
> >> VectorType v;
> >> // play with v..
> >> // now you want some statistics on v:
> >>
> >> Statistics<VectorType> s(v);
> >>
> >> s.mean();
> >> s.variance();
> >> etc.
> >> etc.
> >>
> >> // MatrixBase would define:
> >> // Statistics<Derived> stats() const {
> return
> >> derived(); }
> >> // so that you can also do:
> >>
> >> v.stats().mean();
> >>
> >
> > Actually I was thinking of:
> > template <typename Scalar>
> > struct DirtyCacher
> > {
> >  bool dirty() const { return _dirty; }
> >  void setDirty(bool f) { _dirty = f; }
> >  Scalar mean(Statistics* obj) const
> >  {
> >    if(dirty())
> >    {
> >      _mean = obj->meanInternal();
> >      _dirty = false;
> >    }
> >    return _mean;
> >  }
> >
> >  bool _dirty;
> >  Scalar _mean;
> > }
> >
> > template <typename Scalar>
> > struct NoCacher
> > {
> >  bool dirty() const { return true; }
> >  void setDirty(bool f) { /*nothing*/ }
> >  Scalar mean(Statistics* obj) const
> >  {
> >    return obj->meanInternal();
> >  }
> > }
> >
> > template <typename VectorType, typename Cacher =
> DirtyCacher<VectorType::Scalar> >
> > class Statistics
> > {
> >  Statistics(VectorType v)
> >  : _v(v) {}
> >  VectorType::Scalar mean() const
> >  {
> >    return cacher.mean(this);
> >  }
> >  const VectorType& value() const
> >  {
> >    // always assume this isn't changed
> >    return _v;
> >  }
> >  VectorType& value()
> >  {
> >    cacher.setDirty(true); // always assume the data is
> changed
> >    return _v;
> >  }
> >  ...
> > protected:
> >  Cacher cacher;
> >  friend struct DirtyCacher;
> >  friend struct NoCacher;
> >  VectorType::Scalar meanInternal()
> >  {
> >    // always calculate the mean here
> >    Scalar ret = 0;
> >    for(size_t i=0; i<_v.size(); ++i)
> >    {
> >      ret += _v[i];
> >    }
> >    return ret / _v.size();
> >  }
> > }
> >
> > Basically, if the user chooses to use the DirtyCacher,
> he should get a mean() lookup speed increase but he'll
> loose 1+sizeof(Scalar) bytes. If he uses NoCacher, he gets
> the space back but with a lookup speed loss. If there are
> more items to cache, then we can use a bitfield to minimize
> the space loss (eg. use meanDirty, varienceDirty,
> sortDirty).
> > Also the other nice thing about this design is that
> VectorType doesn't need to know anything about the
> existance of the Statistic class.
> > The returning the value as non-const maybe could be
> sped up if VectorType can indicate that an operation has
> modified it or not. This is just the most straight forward
> thing i could think of.
> >
> >>
> >> actually the pb is that you don't have access
> to
> >> "*this"... but you
> >> can use the result of any function as default
> argument
> >> value, for
> >> instance:
> >>
> >> float foo(float v = random()) {...}
> >>
> >> is ok.
> >
> > Ahh, i did not know that... interesting..
> >
> > cheers
> > Ben
> >
> >
> >
> >
> >>
> >> >> I prefer that to something like:
> >> >> Scalar stddev(Scalar*  returnedMean = 0)
> const;
> >> >> because in case you have already computed
> the mean
> >> value
> >> >> you cannot
> >> >> tell stddev to use it...
> >> >
> >> > This is confusing. what does the function
> return if
> >> returnedMean == 0?
> >> > It might be better to cache the mean and not
> return it
> >> in the stddev() method. The user has access to it
> in mean(),
> >> no?
> >>
> >>  I think I was not clear here. I meant: "I
> prefer the
> >> above approach
> >> rather than the following common solution".
> >> and to be clear, its implementation would be:
> >> Scalar stddev(Scalar*  returnedMean = 0) const
> >> {
> >>    Scalar m = mean();
> >>    if (returnedMean) *returnedMean = m;
> >>    // compute and returns stddev
> >> }
> >>
> >> but again I don't like this approach !!
> >>
> >> >>
> >> >> so to summarize we would add:
> >> >>
> >> >> * mean()
> >> >>
> >> >> * standardDeviation([precomputed_mean])
> >> >> yeah, eventually
> "standardDeviation" is
> >> not too
> >> >> long
> >> >>
> >> >> * variance([precomputed_mean])
> >> >>
> >> >> * median()
> >> >> maybe here we could have an optional
> argument to
> >> tell if
> >> >> the vector is
> >> >> already sorted ??
> >> >
> >> > We could have a cached sorted flag...
> >>
> >> same pb as above
> >>
> >>
> >> gael.
> >>
> >> >>
> >> >> * sort()
> >> >> median needs a sort algo, and here I
> really mean
> >> >> "sort", not "sorted",
> >> >> so it an in-place sort
> >> >>
> >> >
> >> >
> >> > Cheers
> >> > Ben
> >> >
> >> >
> >> >>
> >> >> cheers,
> >> >> gael.
> >> >>
> >> >> On Thu, Sep 11, 2008 at 8:29 PM, Andre
> Krause
> >> >> <post@xxxxxxxxxxxxxxxx> wrote:
> >> >> > dear list, just wanted to let you
> all know i
> >> am using
> >> >> eigen2 alpha 07 for the
> >> >> > first time - and i am very
> satisfied. works
> >> like
> >> >> charm, with no problems on
> >> >> > win32 and visualc++ 2008.
> >> >> >
> >> >> > though i am missing some basic
> convenience
> >> functions
> >> >> like mean, medium, stdev
> >> >> > etc. . i already heard on #eigen,
> that they
> >> are
> >> >> planned for some future release.
> >> >> >
> >> >> > kind regards,
> >> >> >        andre
> >> >> >
> >> >> >
> >> >> >
> >> >
> >> >
> >> >
> >
> >
> >



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/