[eigen] Implementation of a TensorMultiMap

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


I would like to implement a "TensorMultiMap" that takes as input an array of pointers from multiple Tensors so that one can perform operations on the combined array of Tensors instead of having to allocate new memory and perform a series of `concatenate` operations prior.  

The problem I am running into is that I have two arrays (of dynamic length, but equal in size) of Tensors (e.g., std::vector<Tensor> tensor1 and std::vector<Tensor> tensor2) where I would like to efficiently multiply each Tensor by one another and sum the result in a single output Tensor.  I can accomplish this easily enough using a for loop, but because I am not able to use auto, an evaluation must be made at each iteration of the loop.  Using Cuda, this results in a new launch of a kernal, which drastically impacts performance.  

I have experimented with using a recursive function, but unfortunately, this does not work with Cuda 11 (the code will compile, but the stream will never sync).

Is a "TensorMultiMap" possible?  If so, how best could it be implemented?


Douglas McCloskey, PhD
Group Leader, AutoFlow
Laison, Information Services/Computational Biology
DTU Biosustain
Technical University of Denmark
Novo Nordisk Foundation Center for Biosustainability
Building 220, Room 218
2800 Kgs.Lyngby

Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/