[eigen] overriding scalar_X_op for auto-dif?

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


We (Matt Hoffman, Michael Malecki, Andrew Gelman and I,
in the statistics department at Columbia Uni.) have been
integrating Eigen and our own reverse-mode
automatic differentiation and have run into a bit
of a puzzler.  It might be solved if I could
provide a template override for

Eigen::internal::scalar_product_op(T1 a, T2 b);

and its brethren and have this work out properly
with higher-level calls to matrix ops.  For instance,
T1 might be double and T2 an auto-dif variable, and
I'd want to be able to perform matrix multiplications
when one matrix held doubles and the other auto-dif vars.

Here's the longer story to provide some context.

With only auto-dif vars in the matrices, Eigen 3
actually runs pretty much out of the box.

I did wind up modifying the Eigen code itself to remove
references to std:: in calls such as std::sqrt().
I couldn't override them that way.  The problem is that
I'm overloading sqrt(), but can't extend std (if I
understand C++ properly -- I've only been at it since
January).

I've unit tested all the arithmetic ops, determinants, inverses,
eigenvalues, and the lu and llt decompositions and
all the gradients and values pass.  (I'm going to send
some comments on your doc for NumTraits in a separate
message.)

Where I'm running into problems is when I want to
mix double values and auto-dif vars.  For convenience,
let var be our auto-dif var.  We override arithmetic
for it in the usual way, such as:

var operator+(var a, double b);

For a first pass, I did the same thing with Eigen matrices
(and vectors):

Matrix<var,Dynamic,Dynamic>
operator*(const Matrix<var,Dynamic,Dynamic>& a,
          const Matrix<double,Dynamic,Dynamic>& b);

This also works fine.  My implementation for now just
promotes the double values to auto-dif variables and
then leans on Eigen's templating.  This isn't the most
efficient way to do things, but I can optimize later
with the right interface in place by propagating my
own derivatives for many of the matrix ops.

Perhaps not surprisingly for the Eigen developers, I run
into problems with transpose:

Matrix<var,Dynamic,Dynamic> a = ...;
Matrix<double,Dynamic,Dynamic> b = ...;
Matrix<var,Dynamic,Dynamic> = a.transpose() * b;

The problem is neatly summarized in the last two error
messages from the compiler (gcc 4.2 on Mac OSX
for what it's worth):

----------------------
....
lib/Eigen/src/Core/Product.h:590: instantiated from ‘const typename Eigen::ProductReturnType<Derived, OtherDerived, Eigen::internal::product_type<Derived,OtherDerived>::value>::Type Eigen::MatrixBase<Derived>::operator*(const Eigen::MatrixBase<OtherDerived>&) const [with OtherDerived = Eigen::Matrix<stan::agrad::var, -0x00000000000000001, 1, 0, -0x00000000000000001, 1>, Derived = Eigen::Transpose<Eigen::Matrix<double, -0x00000000000000001, 1, 0, -0x00000000000000001, 1> >]’

src/test/agrad/agrad_eigen_test.cpp:178:   instantiated from here
lib/Eigen/src/Core/CwiseBinaryOp.h:188: error: no match for call to ‘(const Eigen::internal::scalar_product_op<double, stan::agrad::var>) (const double, const stan::agrad::var&)’
---------------------

Eigen's use of expressions and views leads to new classes
and hence new operations being introduced, such as
multiplying a transposed double matrix and auto-dif variable matrix.
These, I haven't overridden.

What I was wondering is whether I could get away with overriding
the scalar_product_op() and related functions instead of the
matrix operations.  This would be ideal from both an
efficiency and simplicity point of view.

If it makes matters easier, we're happy to restrict ourselves
to dynamic dimensions and matrices containing either auto-dif
variables or double-precision floating point (but not both).

- Bob Carpenter
  Columbia University, Dept. of Statistics

PS:  Our project is an implementation of Hamiltonian Monte
     Carlo sampling from unnormalized probability densities.
     It overcomes issues with correlated parameters nicely,
     but requires gradient calcs.  In our case, these are
     typically Bayesian posteriors from multilevel regression
     models.

PPS:  Thanks for the LGPL license!  We're planning on a BSD license
      and will release our code as soon as we have a working and
      tested first version (the project has two years of funding,
      but we hope to release a first version soon, as in a month
      or two).  I'm happy to share real code if that'd help.

PPPS:  The auto-dif is standalone (in the no-dependencies sense)
       and we'd be happy to contribute whatever we do for integration
       for your experimental pages.




Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/