Re: [eigen] Componentwise Operations on an Arbitrary Number of Tensors

[ Thread Index | Date Index | More Archives ]


For reference, here is a bugzilla entry for this feature request:

An idea we had back then was to introduce Eigen::tie and then allow assignments like:

    Eigen::ArrayXd A, B, X;
    Eigen::tie(A,B) = sin(X), cos(X);

(and of course, one could introduce sincos, minmax, ... -operators which are assignable to tie-expressions).


On 28.12.2016 at 22:48, Gael Guennebaud wrote:

it seems that what you're looking for is a mean to merge multiple
evaluation loops of the same size into a single one (the fact that they run
on the GPU is not really important here). Actually, this needs already
shows up for stuff like:

a = vec.minCoeff();
b = vec.maxCoeff();

that currently requires two loops. I remember that we already talked about
that with Benoit S., and I don't think there is a general solution
implemented in the Tensor module yet.

Technically, I don't think that's very difficult though. The main
difficulty is perhaps on the API side. We could imagine something like:

auto E1 = (R1.deferred() = expr1);
auto E2 = (R2.deferred() = expr2);
merged_eval(E1, E2, ...);

that would essentially generate:

(parallel/GPU/whatever) for loop {
  R1[i] = expr1.coeffl(i);
  R2[i] = expr2.coeffl(i);

In Eigen/Core, "R.deferred().operator=(expr)"  would return an
Eigen::internal::Assignment expression (without calling run) that would be
merged by the merged_eval function.


On Wed, Dec 28, 2016 at 3:22 PM, Graham Neubig <gneubig@xxxxxxxxxx> wrote:

Hi Eigen Folks,

First, thanks for the great library. We're using it in our machine
learning library DyNet to great success.

I had a quick question about something that seems like it should be
possible, but I haven't found a reference. I currently have code here:

That implements the "Adam" update rule for stochastic gradient descent
found in this paper:

Here, all places with "tvec()" are Eigen one-dimensional Tensors. The
thing that bugs me here is that I'm calling 4 different operations, which
results in 4 different GPU kernel launches, for an operation that is
inherently componentwise. If possible, I'd like to be able to basically
create a single functor that takes 4 floats, and modifies them
appropriately, then pass this in a single GPU operation.

I know this is possible using binaryExpr() for binary expressions, but I
couldn't find it for operations with a larger number of arguments. Is there
any chance that there is an elegant way to do this within Eigen (i.e.
without writing my own kernel)?


 Dipl. Inf., Dipl. Math. Christoph Hertzberg

 Universität Bremen
 FB 3 - Mathematik und Informatik
 AG Robotik
 Robert-Hooke-Straße 1
 28359 Bremen, Germany

 Zentrale: +49 421 178 45-6611

 Besuchsadresse der Nebengeschäftsstelle:
 Robert-Hooke-Straße 5
 28359 Bremen, Germany

 Tel.:    +49 421 178 45-4021
 Empfang: +49 421 178 45-6600
 Fax:     +49 421 178 45-4150
 E-Mail:  chtz@xxxxxxxxxxxxxxxxxxxxxxxx

 Weitere Informationen:

Mail converted by MHonArc 2.6.19+