Adding support for OpenCL/SYCL to the tensor module
would make a number of people very happy as it would enable them
to use tensors on a much broader set of GPUs. However I am not
aware of any existing effort to work on this, so your
contribution would be very welcome.
The main problem with OpenCL is that the computation
kernel and the host code must be in separate files. Since
SYCL supports having both in a single source file it's a
much better fit for Eigen.
The best way to start is to look at the GPUDevice class
(it really should be called CudaDevice though since it only
supports CUDA). It provides a simple abstraction on top of
cuda streams to launch kernels. It also implements an API
common to all the devices to handle basic operations such as
memory allocation and memcpy. You'll need to create a SYCL
device that provides a similar service for OpenCL.
The second step would be to create a specialization of
TensorExecutor for your SYCL device. The job of the tensor
executor is to schedule and launch the computation on the
target device. The executor for a GPU Device simply shards
the computation over the streaming multipprocessors of a
CUDA GPU.
With these 2 components in place the entire tensor module
should work. However the performance of some operations
(such as the contractions) will be disappointing. If that's
a problem, I can help you write an optimized implementation
of contractions for OpenCL, probably following the approach
from TensorContractionCuda.h