Re: [eigen] Componentwise Operations on an Arbitrary Number of Tensors

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Hi,

it seems that what you're looking for is a mean to merge multiple evaluation loops of the same size into a single one (the fact that they run on the GPU is not really important here). Actually, this needs already shows up for stuff like:

a = vec.minCoeff();
b = vec.maxCoeff();

that currently requires two loops. I remember that we already talked about that with Benoit S., and I don't think there is a general solution implemented in the Tensor module yet.

Technically, I don't think that's very difficult though. The main difficulty is perhaps on the API side. We could imagine something like:

auto E1 = (R1.deferred() = expr1);
auto E2 = (R2.deferred() = expr2);
...
merged_eval(E1, E2, ...);

that would essentially generate:

(parallel/GPU/whatever) for loop {
  R1[i] = expr1.coeffl(i);
  R2[i] = expr2.coeffl(i);
  ...
}

In Eigen/Core, "R.deferred().operator=(expr)"  would return an Eigen::internal::Assignment _expression_ (without calling run) that would be merged by the merged_eval function.


gael


On Wed, Dec 28, 2016 at 3:22 PM, Graham Neubig <gneubig@xxxxxxxxxx> wrote:
Hi Eigen Folks,

First, thanks for the great library. We're using it in our machine learning library DyNet to great success.

I had a quick question about something that seems like it should be possible, but I haven't found a reference. I currently have code here:
https://github.com/clab/dynet/blob/master/dynet/training.cc#L280

That implements the "Adam" update rule for stochastic gradient descent found in this paper:
https://arxiv.org/abs/1412.6980

Here, all places with "tvec()" are Eigen one-dimensional Tensors. The thing that bugs me here is that I'm calling 4 different operations, which results in 4 different GPU kernel launches, for an operation that is inherently componentwise. If possible, I'd like to be able to basically create a single functor that takes 4 floats, and modifies them appropriately, then pass this in a single GPU operation.

I know this is possible using binaryExpr() for binary expressions, but I couldn't find it for operations with a larger number of arguments. Is there any chance that there is an elegant way to do this within Eigen (i.e. without writing my own kernel)?

Graham



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/