Re: [eigen] first time usage by me! and very satisfied!

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


On Sat, Sep 13, 2008 at 1:27 AM, Schleimer, Ben <bensch128@xxxxxxxxx> wrote:
> Sorry about that. I haven't actually worked with Eigen2 yet. Work is mostly on my mind :( (I have a story about two very badly implemented Matrix libraries (in the name of optimization) which is why I'm eager to see eigen2 be well written.)

hehe we all have the same expectations here :)

so what you propose is basically what I imagined with an additional
feature alowing to remove the cache mechanism (and storage) at compile
time. So, for instance,  v.stats() would typically return a Statistics
object without cache that is good. However, I'm still not  fan of this
approach .

Indeed, my main concern is how to combine this approach with column
and row wise stuff ?
For your information all reduction methods  are also available in a
column and row wise flavor, for instance:

vec = mat.colwise().sum();
vec = mat.colwise().minCoeff();
etc...

where "mat.colwise().minCoeff()" actualy returns an expression.

So let's pick an example. With my initial proposal you could write:

Vector means = mat.colwise().mean();
Vector stddevs = mat.colwise().standardDeviation(means);
for (....)
{
   // use means and stddevs as many times as you want
}

here you only allocate what is really needed, and there is zero
overhead. Of course the user might forget to pass the precomputed mean
to standardDeviation:

Vector means = mat.colwise().mean();
Vector stddevs = mat.colwise().standardDeviation();

in which case the means are computed twice. IMO, I think this is the
unique little drawback of this approach. So here comes the caching
mechanism.

On the other hand with the cache enabled you would either have to
allocate all the data that might be cached and then the Statistics
object might become very large. Or you can rely on dynamic allocations
but you still have to store the pointers and have to pay for the
dynamic allocation cost. Of course you could also have compile time
bit flags to tell which data you wanna cache, but that sounds rather
tedious for the user. Another argument is that anytime you want to
access to a cached value you still have to pay for an extra "if".
Furthermore the user have to take care to access the matrix data
through the Statistics object such that the cache is automatically
invalidated...

So all in all I don't think the little advantage offered by this
approach overcome its drawbacks.

any counter-arguments ?

gael.

>>
>> ah ! after answering the rest of the comments I now see how
>> it could
>> be done. maybe you were thinking  to something like that:
>>
>> VectorType v;
>> // play with v..
>> // now you want some statistics on v:
>>
>> Statistics<VectorType> s(v);
>>
>> s.mean();
>> s.variance();
>> etc.
>> etc.
>>
>> // MatrixBase would define:
>> // Statistics<Derived> stats() const { return
>> derived(); }
>> // so that you can also do:
>>
>> v.stats().mean();
>>
>
> Actually I was thinking of:
> template <typename Scalar>
> struct DirtyCacher
> {
>  bool dirty() const { return _dirty; }
>  void setDirty(bool f) { _dirty = f; }
>  Scalar mean(Statistics* obj) const
>  {
>    if(dirty())
>    {
>      _mean = obj->meanInternal();
>      _dirty = false;
>    }
>    return _mean;
>  }
>
>  bool _dirty;
>  Scalar _mean;
> }
>
> template <typename Scalar>
> struct NoCacher
> {
>  bool dirty() const { return true; }
>  void setDirty(bool f) { /*nothing*/ }
>  Scalar mean(Statistics* obj) const
>  {
>    return obj->meanInternal();
>  }
> }
>
> template <typename VectorType, typename Cacher = DirtyCacher<VectorType::Scalar> >
> class Statistics
> {
>  Statistics(VectorType v)
>  : _v(v) {}
>  VectorType::Scalar mean() const
>  {
>    return cacher.mean(this);
>  }
>  const VectorType& value() const
>  {
>    // always assume this isn't changed
>    return _v;
>  }
>  VectorType& value()
>  {
>    cacher.setDirty(true); // always assume the data is changed
>    return _v;
>  }
>  ...
> protected:
>  Cacher cacher;
>  friend struct DirtyCacher;
>  friend struct NoCacher;
>  VectorType::Scalar meanInternal()
>  {
>    // always calculate the mean here
>    Scalar ret = 0;
>    for(size_t i=0; i<_v.size(); ++i)
>    {
>      ret += _v[i];
>    }
>    return ret / _v.size();
>  }
> }
>
> Basically, if the user chooses to use the DirtyCacher, he should get a mean() lookup speed increase but he'll loose 1+sizeof(Scalar) bytes. If he uses NoCacher, he gets the space back but with a lookup speed loss. If there are more items to cache, then we can use a bitfield to minimize the space loss (eg. use meanDirty, varienceDirty, sortDirty).
> Also the other nice thing about this design is that VectorType doesn't need to know anything about the existance of the Statistic class.
> The returning the value as non-const maybe could be sped up if VectorType can indicate that an operation has modified it or not. This is just the most straight forward thing i could think of.
>
>>
>> actually the pb is that you don't have access to
>> "*this"... but you
>> can use the result of any function as default argument
>> value, for
>> instance:
>>
>> float foo(float v = random()) {...}
>>
>> is ok.
>
> Ahh, i did not know that... interesting..
>
> cheers
> Ben
>
>
>
>
>>
>> >> I prefer that to something like:
>> >> Scalar stddev(Scalar*  returnedMean = 0) const;
>> >> because in case you have already computed the mean
>> value
>> >> you cannot
>> >> tell stddev to use it...
>> >
>> > This is confusing. what does the function return if
>> returnedMean == 0?
>> > It might be better to cache the mean and not return it
>> in the stddev() method. The user has access to it in mean(),
>> no?
>>
>>  I think I was not clear here. I meant: "I prefer the
>> above approach
>> rather than the following common solution".
>> and to be clear, its implementation would be:
>> Scalar stddev(Scalar*  returnedMean = 0) const
>> {
>>    Scalar m = mean();
>>    if (returnedMean) *returnedMean = m;
>>    // compute and returns stddev
>> }
>>
>> but again I don't like this approach !!
>>
>> >>
>> >> so to summarize we would add:
>> >>
>> >> * mean()
>> >>
>> >> * standardDeviation([precomputed_mean])
>> >> yeah, eventually "standardDeviation" is
>> not too
>> >> long
>> >>
>> >> * variance([precomputed_mean])
>> >>
>> >> * median()
>> >> maybe here we could have an optional argument to
>> tell if
>> >> the vector is
>> >> already sorted ??
>> >
>> > We could have a cached sorted flag...
>>
>> same pb as above
>>
>>
>> gael.
>>
>> >>
>> >> * sort()
>> >> median needs a sort algo, and here I really mean
>> >> "sort", not "sorted",
>> >> so it an in-place sort
>> >>
>> >
>> >
>> > Cheers
>> > Ben
>> >
>> >
>> >>
>> >> cheers,
>> >> gael.
>> >>
>> >> On Thu, Sep 11, 2008 at 8:29 PM, Andre Krause
>> >> <post@xxxxxxxxxxxxxxxx> wrote:
>> >> > dear list, just wanted to let you all know i
>> am using
>> >> eigen2 alpha 07 for the
>> >> > first time - and i am very satisfied. works
>> like
>> >> charm, with no problems on
>> >> > win32 and visualc++ 2008.
>> >> >
>> >> > though i am missing some basic convenience
>> functions
>> >> like mean, medium, stdev
>> >> > etc. . i already heard on #eigen, that they
>> are
>> >> planned for some future release.
>> >> >
>> >> > kind regards,
>> >> >        andre
>> >> >
>> >> >
>> >> >
>> >
>> >
>> >
>
>
>



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/