Re: [eigen] Componentwise Operations on an Arbitrary Number of Tensors |

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

*To*: eigen <eigen@xxxxxxxxxxxxxxxxxxx>*Subject*: Re: [eigen] Componentwise Operations on an Arbitrary Number of Tensors*From*: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>*Date*: Wed, 28 Dec 2016 22:48:03 +0100*Dkim-signature*: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=WuELv2FYym2MYYqzmMxJgUhHyp0aGh3ABtnwmiv/aRE=; b=X9dAg9C4lEFSpPTUXjXO/NUdk3dJov4qmQC0rmh0xJJRUUy19CAutcpo02ZhfHdK3V OGbC+cJxj32fmp8sFOxuc8izqytvmOOFmAP8vCZKG82oZZZVKqN1WlqB8mEObXPAxHDC FTB4iJsr3u0v2euWY14qsbbwyoZZyY0utpei0rQsJUO6ZvZE/lplUlWgAD6TO7vl6Eg+ LtHAjJOADh56UCV+qp3J+tIT16OXEXJjZlY1rzniwKTVv5wMLZxMlaflscb4K0OvtdZh FOdbETmHqlSybovXZDwUvksVO1FESVCtdHshfPIChmOkA5HwsUeJTl7I+FEwrzTZWDEZ f6cA==

Hi,

it seems that what you're looking for is a mean to merge multiple evaluation loops of the same size into a single one (the fact that they run on the GPU is not really important here). Actually, this needs already shows up for stuff like:

a = vec.minCoeff();

b = vec.maxCoeff();

that currently requires two loops. I remember that we already talked about that with Benoit S., and I don't think there is a general solution implemented in the Tensor module yet.

Technically, I don't think that's very difficult though. The main difficulty is perhaps on the API side. We could imagine something like:

auto E1 = (R1.deferred() = expr1);

auto E2 = (R2.deferred() = expr2);

...

merged_eval(E1, E2, ...);

that would essentially generate:

(parallel/GPU/whatever) for loop {

R1[i] = expr1.coeffl(i);

R2[i] = expr2.coeffl(i);

...

}

In Eigen/Core, "R.deferred().operator=(expr)" would return an Eigen::internal::Assignment _expression_ (without calling run) that would be merged by the merged_eval function.

gael

On Wed, Dec 28, 2016 at 3:22 PM, Graham Neubig <gneubig@xxxxxxxxxx> wrote:

Hi Eigen Folks,First, thanks for the great library. We're using it in our machine learning library DyNet to great success.I had a quick question about something that seems like it should be possible, but I haven't found a reference. I currently have code here:That implements the "Adam" update rule for stochastic gradient descent found in this paper:Here, all places with "tvec()" are Eigen one-dimensional Tensors. The thing that bugs me here is that I'm calling 4 different operations, which results in 4 different GPU kernel launches, for an operation that is inherently componentwise. If possible, I'd like to be able to basically create a single functor that takes 4 floats, and modifies them appropriately, then pass this in a single GPU operation.I know this is possible using binaryExpr() for binary expressions, but I couldn't find it for operations with a larger number of arguments. Is there any chance that there is an elegant way to do this within Eigen (i.e. without writing my own kernel)?Graham

**Follow-Ups**:**Re: [eigen] Componentwise Operations on an Arbitrary Number of Tensors***From:*Graham Neubig

**Re: [eigen] Componentwise Operations on an Arbitrary Number of Tensors***From:*Ilja Honkonen

**Re: [eigen] Componentwise Operations on an Arbitrary Number of Tensors***From:*Christoph Hertzberg

**References**:**[eigen] Componentwise Operations on an Arbitrary Number of Tensors***From:*Graham Neubig

**Messages sorted by:**[ date | thread ]- Prev by Date:
**[eigen] Broadcasting Slow on CPU?** - Next by Date:
**Re: [eigen] Componentwise Operations on an Arbitrary Number of Tensors** - Previous by thread:
**[eigen] Componentwise Operations on an Arbitrary Number of Tensors** - Next by thread:
**Re: [eigen] Componentwise Operations on an Arbitrary Number of Tensors**

Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |