|[eigen] Implementation of a TensorMultiMap|
[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]
I would like to implement a "TensorMultiMap" that takes as input an array of pointers from multiple Tensors so that one can perform operations on the combined array of Tensors instead of having to allocate new memory and perform a series of `concatenate` operations prior.
The problem I am running into is that I have two arrays (of dynamic length, but equal in size) of Tensors (e.g., std::vector<Tensor> tensor1 and std::vector<Tensor> tensor2) where I would like to efficiently multiply each Tensor by one another and sum the result in a single output Tensor. I can accomplish this easily enough using a for loop, but because I am not able to use auto, an evaluation must be made at each iteration of the loop. Using Cuda, this results in a new launch of a kernal, which drastically impacts performance.
I have experimented with using a recursive function, but unfortunately, this does not work with Cuda 11 (the code will compile, but the stream will never sync).
Is a "TensorMultiMap" possible? If so, how best could it be implemented?
|Mail converted by MHonArc 2.6.19+||http://listengine.tuxfamily.org/|