Re: [eigen] first time usage by me! and very satisfied! |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] first time usage by me! and very satisfied!
- From: "Schleimer, Ben" <bensch128@xxxxxxxxx>
- Date: Sat, 13 Sep 2008 13:06:55 -0700 (PDT)
- Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Message-ID; b=L6xBgFHnfqiKsmjS1hdAKa99WS1fj551/Z5a97ltXdpLxFuQmnRLTJFCe1Y6tgmqvMO+ScUWqPu+uNMgmqhCs2SY5QLfJp7OYjRJmvqUDxdlZvo/hAunRyFYfrvGfwGmbmzktj7cpnoIcnYlWif1RFICiYoOc+ZWhCWfRVeKBu8=;
Hi Gael,
I see your point about caching. Maybe it's just better to have the user to the caching on a case-by-case basis. My main concern then is the tight coupling between matrix types and statistics means. There doesn't need to be that way. I really want a mixin mechanism to exist in C++ so the user can choose to include the statistics methods or not (on a case-by-case basis)
Maybe this construction would work:
template <typename Expression>
class Statistics : public Expression // should this be a VectorType?
{
// Standard VectorType constructors, maybe there needs to be
// default c'tors and init methods
Statistics<Expression> colwise() const
{
return Statistics<Expression>(VectorType::colwise());
}
Statistics<Expression> rowwise() const
{
return Statistics<Expression>(VectorType::rowwise());
}
Expression mean() const
{
// return an expression for the mean
}
Expression standardDeviation() const
{
// Return an expression for the standard deviation
}
}
Alternatively, it might be better to have c-style functions which take in Expressions of vectors and return expressions which evaluate to the mean/variance/stddeviation/etc...
Cheers
Ben
--- On Sat, 9/13/08, Gael Guennebaud <gael.guennebaud@xxxxxxxxx> wrote:
> From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
> Subject: Re: [eigen] first time usage by me! and very satisfied!
> To: eigen@xxxxxxxxxxxxxxxxxxx
> Date: Saturday, September 13, 2008, 9:23 AM
> On Sat, Sep 13, 2008 at 1:27 AM, Schleimer, Ben
> <bensch128@xxxxxxxxx> wrote:
> > Sorry about that. I haven't actually worked with
> Eigen2 yet. Work is mostly on my mind :( (I have a story
> about two very badly implemented Matrix libraries (in the
> name of optimization) which is why I'm eager to see
> eigen2 be well written.)
>
> hehe we all have the same expectations here :)
>
> so what you propose is basically what I imagined with an
> additional
> feature alowing to remove the cache mechanism (and storage)
> at compile
> time. So, for instance, v.stats() would typically return a
> Statistics
> object without cache that is good. However, I'm still
> not fan of this
> approach .
>
> Indeed, my main concern is how to combine this approach
> with column
> and row wise stuff ?
> For your information all reduction methods are also
> available in a
> column and row wise flavor, for instance:
>
> vec = mat.colwise().sum();
> vec = mat.colwise().minCoeff();
> etc...
>
> where "mat.colwise().minCoeff()" actualy returns
> an expression.
>
> So let's pick an example. With my initial proposal you
> could write:
>
> Vector means = mat.colwise().mean();
> Vector stddevs = mat.colwise().standardDeviation(means);
> for (....)
> {
> // use means and stddevs as many times as you want
> }
>
> here you only allocate what is really needed, and there is
> zero
> overhead. Of course the user might forget to pass the
> precomputed mean
> to standardDeviation:
>
> Vector means = mat.colwise().mean();
> Vector stddevs = mat.colwise().standardDeviation();
>
> in which case the means are computed twice. IMO, I think
> this is the
> unique little drawback of this approach. So here comes the
> caching
> mechanism.
>
> On the other hand with the cache enabled you would either
> have to
> allocate all the data that might be cached and then the
> Statistics
> object might become very large. Or you can rely on dynamic
> allocations
> but you still have to store the pointers and have to pay
> for the
> dynamic allocation cost. Of course you could also have
> compile time
> bit flags to tell which data you wanna cache, but that
> sounds rather
> tedious for the user. Another argument is that anytime you
> want to
> access to a cached value you still have to pay for an extra
> "if".
> Furthermore the user have to take care to access the matrix
> data
> through the Statistics object such that the cache is
> automatically
> invalidated...
>
> So all in all I don't think the little advantage
> offered by this
> approach overcome its drawbacks.
>
> any counter-arguments ?
>
> gael.
>
> >>
> >> ah ! after answering the rest of the comments I
> now see how
> >> it could
> >> be done. maybe you were thinking to something
> like that:
> >>
> >> VectorType v;
> >> // play with v..
> >> // now you want some statistics on v:
> >>
> >> Statistics<VectorType> s(v);
> >>
> >> s.mean();
> >> s.variance();
> >> etc.
> >> etc.
> >>
> >> // MatrixBase would define:
> >> // Statistics<Derived> stats() const {
> return
> >> derived(); }
> >> // so that you can also do:
> >>
> >> v.stats().mean();
> >>
> >
> > Actually I was thinking of:
> > template <typename Scalar>
> > struct DirtyCacher
> > {
> > bool dirty() const { return _dirty; }
> > void setDirty(bool f) { _dirty = f; }
> > Scalar mean(Statistics* obj) const
> > {
> > if(dirty())
> > {
> > _mean = obj->meanInternal();
> > _dirty = false;
> > }
> > return _mean;
> > }
> >
> > bool _dirty;
> > Scalar _mean;
> > }
> >
> > template <typename Scalar>
> > struct NoCacher
> > {
> > bool dirty() const { return true; }
> > void setDirty(bool f) { /*nothing*/ }
> > Scalar mean(Statistics* obj) const
> > {
> > return obj->meanInternal();
> > }
> > }
> >
> > template <typename VectorType, typename Cacher =
> DirtyCacher<VectorType::Scalar> >
> > class Statistics
> > {
> > Statistics(VectorType v)
> > : _v(v) {}
> > VectorType::Scalar mean() const
> > {
> > return cacher.mean(this);
> > }
> > const VectorType& value() const
> > {
> > // always assume this isn't changed
> > return _v;
> > }
> > VectorType& value()
> > {
> > cacher.setDirty(true); // always assume the data is
> changed
> > return _v;
> > }
> > ...
> > protected:
> > Cacher cacher;
> > friend struct DirtyCacher;
> > friend struct NoCacher;
> > VectorType::Scalar meanInternal()
> > {
> > // always calculate the mean here
> > Scalar ret = 0;
> > for(size_t i=0; i<_v.size(); ++i)
> > {
> > ret += _v[i];
> > }
> > return ret / _v.size();
> > }
> > }
> >
> > Basically, if the user chooses to use the DirtyCacher,
> he should get a mean() lookup speed increase but he'll
> loose 1+sizeof(Scalar) bytes. If he uses NoCacher, he gets
> the space back but with a lookup speed loss. If there are
> more items to cache, then we can use a bitfield to minimize
> the space loss (eg. use meanDirty, varienceDirty,
> sortDirty).
> > Also the other nice thing about this design is that
> VectorType doesn't need to know anything about the
> existance of the Statistic class.
> > The returning the value as non-const maybe could be
> sped up if VectorType can indicate that an operation has
> modified it or not. This is just the most straight forward
> thing i could think of.
> >
> >>
> >> actually the pb is that you don't have access
> to
> >> "*this"... but you
> >> can use the result of any function as default
> argument
> >> value, for
> >> instance:
> >>
> >> float foo(float v = random()) {...}
> >>
> >> is ok.
> >
> > Ahh, i did not know that... interesting..
> >
> > cheers
> > Ben
> >
> >
> >
> >
> >>
> >> >> I prefer that to something like:
> >> >> Scalar stddev(Scalar* returnedMean = 0)
> const;
> >> >> because in case you have already computed
> the mean
> >> value
> >> >> you cannot
> >> >> tell stddev to use it...
> >> >
> >> > This is confusing. what does the function
> return if
> >> returnedMean == 0?
> >> > It might be better to cache the mean and not
> return it
> >> in the stddev() method. The user has access to it
> in mean(),
> >> no?
> >>
> >> I think I was not clear here. I meant: "I
> prefer the
> >> above approach
> >> rather than the following common solution".
> >> and to be clear, its implementation would be:
> >> Scalar stddev(Scalar* returnedMean = 0) const
> >> {
> >> Scalar m = mean();
> >> if (returnedMean) *returnedMean = m;
> >> // compute and returns stddev
> >> }
> >>
> >> but again I don't like this approach !!
> >>
> >> >>
> >> >> so to summarize we would add:
> >> >>
> >> >> * mean()
> >> >>
> >> >> * standardDeviation([precomputed_mean])
> >> >> yeah, eventually
> "standardDeviation" is
> >> not too
> >> >> long
> >> >>
> >> >> * variance([precomputed_mean])
> >> >>
> >> >> * median()
> >> >> maybe here we could have an optional
> argument to
> >> tell if
> >> >> the vector is
> >> >> already sorted ??
> >> >
> >> > We could have a cached sorted flag...
> >>
> >> same pb as above
> >>
> >>
> >> gael.
> >>
> >> >>
> >> >> * sort()
> >> >> median needs a sort algo, and here I
> really mean
> >> >> "sort", not "sorted",
> >> >> so it an in-place sort
> >> >>
> >> >
> >> >
> >> > Cheers
> >> > Ben
> >> >
> >> >
> >> >>
> >> >> cheers,
> >> >> gael.
> >> >>
> >> >> On Thu, Sep 11, 2008 at 8:29 PM, Andre
> Krause
> >> >> <post@xxxxxxxxxxxxxxxx> wrote:
> >> >> > dear list, just wanted to let you
> all know i
> >> am using
> >> >> eigen2 alpha 07 for the
> >> >> > first time - and i am very
> satisfied. works
> >> like
> >> >> charm, with no problems on
> >> >> > win32 and visualc++ 2008.
> >> >> >
> >> >> > though i am missing some basic
> convenience
> >> functions
> >> >> like mean, medium, stdev
> >> >> > etc. . i already heard on #eigen,
> that they
> >> are
> >> >> planned for some future release.
> >> >> >
> >> >> > kind regards,
> >> >> > andre
> >> >> >
> >> >> >
> >> >> >
> >> >
> >> >
> >> >
> >
> >
> >