[eigen] overriding scalar_X_op for auto-dif? |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
We (Matt Hoffman, Michael Malecki, Andrew Gelman and I,
in the statistics department at Columbia Uni.) have been
integrating Eigen and our own reverse-mode
automatic differentiation and have run into a bit
of a puzzler. It might be solved if I could
provide a template override for
Eigen::internal::scalar_product_op(T1 a, T2 b);
and its brethren and have this work out properly
with higher-level calls to matrix ops. For instance,
T1 might be double and T2 an auto-dif variable, and
I'd want to be able to perform matrix multiplications
when one matrix held doubles and the other auto-dif vars.
Here's the longer story to provide some context.
With only auto-dif vars in the matrices, Eigen 3
actually runs pretty much out of the box.
I did wind up modifying the Eigen code itself to remove
references to std:: in calls such as std::sqrt().
I couldn't override them that way. The problem is that
I'm overloading sqrt(), but can't extend std (if I
understand C++ properly -- I've only been at it since
January).
I've unit tested all the arithmetic ops, determinants, inverses,
eigenvalues, and the lu and llt decompositions and
all the gradients and values pass. (I'm going to send
some comments on your doc for NumTraits in a separate
message.)
Where I'm running into problems is when I want to
mix double values and auto-dif vars. For convenience,
let var be our auto-dif var. We override arithmetic
for it in the usual way, such as:
var operator+(var a, double b);
For a first pass, I did the same thing with Eigen matrices
(and vectors):
Matrix<var,Dynamic,Dynamic>
operator*(const Matrix<var,Dynamic,Dynamic>& a,
const Matrix<double,Dynamic,Dynamic>& b);
This also works fine. My implementation for now just
promotes the double values to auto-dif variables and
then leans on Eigen's templating. This isn't the most
efficient way to do things, but I can optimize later
with the right interface in place by propagating my
own derivatives for many of the matrix ops.
Perhaps not surprisingly for the Eigen developers, I run
into problems with transpose:
Matrix<var,Dynamic,Dynamic> a = ...;
Matrix<double,Dynamic,Dynamic> b = ...;
Matrix<var,Dynamic,Dynamic> = a.transpose() * b;
The problem is neatly summarized in the last two error
messages from the compiler (gcc 4.2 on Mac OSX
for what it's worth):
----------------------
....
lib/Eigen/src/Core/Product.h:590: instantiated from ‘const typename
Eigen::ProductReturnType<Derived, OtherDerived,
Eigen::internal::product_type<Derived,OtherDerived>::value>::Type
Eigen::MatrixBase<Derived>::operator*(const
Eigen::MatrixBase<OtherDerived>&) const [with OtherDerived =
Eigen::Matrix<stan::agrad::var, -0x00000000000000001, 1, 0,
-0x00000000000000001, 1>, Derived =
Eigen::Transpose<Eigen::Matrix<double, -0x00000000000000001, 1, 0,
-0x00000000000000001, 1> >]’
src/test/agrad/agrad_eigen_test.cpp:178: instantiated from here
lib/Eigen/src/Core/CwiseBinaryOp.h:188: error: no match for call to
‘(const Eigen::internal::scalar_product_op<double, stan::agrad::var>)
(const double, const stan::agrad::var&)’
---------------------
Eigen's use of expressions and views leads to new classes
and hence new operations being introduced, such as
multiplying a transposed double matrix and auto-dif variable matrix.
These, I haven't overridden.
What I was wondering is whether I could get away with overriding
the scalar_product_op() and related functions instead of the
matrix operations. This would be ideal from both an
efficiency and simplicity point of view.
If it makes matters easier, we're happy to restrict ourselves
to dynamic dimensions and matrices containing either auto-dif
variables or double-precision floating point (but not both).
- Bob Carpenter
Columbia University, Dept. of Statistics
PS: Our project is an implementation of Hamiltonian Monte
Carlo sampling from unnormalized probability densities.
It overcomes issues with correlated parameters nicely,
but requires gradient calcs. In our case, these are
typically Bayesian posteriors from multilevel regression
models.
PPS: Thanks for the LGPL license! We're planning on a BSD license
and will release our code as soon as we have a working and
tested first version (the project has two years of funding,
but we hope to release a first version soon, as in a month
or two). I'm happy to share real code if that'd help.
PPPS: The auto-dif is standalone (in the no-dependencies sense)
and we'd be happy to contribute whatever we do for integration
for your experimental pages.