Re: [eigen] Adding support for AMD GPUs in Eigen

On 7 July 2018 at 01:31, Deven Desai <deven.desai.amd@xxxxxxxxx> wrote:

Hi Vincent,

We have not done any benchmarking of Eigen with AMD GPUs yet. I am currently focusing on getting the functionality in place and implementing all the updates requested in the PR feedback so that we can get the PR merged. I expect to be able to do some benchmarking once that is done.

Running with AMD GPU should only require passing "-DEIGEN_USE_HIP" to the compiler (for code that pulls in Eigen header files). Everything else should be similar to what you would do for Nvidia GPUs.

Getting Tensorflow to work with AMD GPUs requires a lot of other changes in addition to this change in Eigen. There is a separate project ongoing that is tasked with getting Tensorflow to work on AMD GPUs. Let me know if you need more information.

Thanks

deven

On Wed, Jul 4, 2018 at 4:57 AM Vincent Hui <vincenthk007@xxxxxxxxxx> wrote:
Hi Deven,

Thank you for your contribution. Did you benchmark Eigen with and without AMD GPU? Can we give us instructions how to use Eigen with AMD GPU? I have AMD GPU, I can try Eigen with AMD GPU. Futhermore, did you benchmark TensorFlow with and without AMD GPU after you added AMD GPU support to Eigen?

Thank a lot,
Vincent

On 7 June 2018 at 04:44, Deven Desai <deven.desai.amd@xxxxxxxxx> wrote:

PR submitted - https://bitbucket.org/eigen/eigen/pull-requests/402/adding-support-for-using-eigen-in-hip/diff

Jason : Thank you for your reponse. Hoping you will find the initial level of HIP support to your liking.

Vincent : yes AMD GPU support is similar to what exists for CUDA / OpenCL

Thanks

deven

On Thu, May 17, 2018 at 7:05 AM Vincent Hui <vincenthk007@xxxxxxxxx> wrote:
Hi Deven,

Is AMD GPUs support similar to Eigen OpenCL hardware support to Eigen?

Thanks,
Vincent

On 17 May 2018 at 17:53, Jason Newton <nevion@xxxxxxxxx> wrote:
Just had to drop in and say cool! It's great to see HIP support
spread through the ecosystem.

I've tried to use Eigen a few times in CUDA and I realized a few problems:

-Solvers that could execute on the GPU didn't, because of dynamic
allocations happening somewhere and I couldn't figure out how to make
that not happen. For things like a batched-qr solve of small
matrices. They may not have actually had happen but the problem is
they'd be referenced on the device side compile, somewhere deep. I
think at the time I was looking at either the SVD or QR solvers.

-It wasn't as flexible as I first hoped, unfortunately there's a lot
of strategies you can use to evaluate matrix operations in warp,
block, or device level parallel and this is outside of what eigen
offers. If it was trying to be a device side library in the capacity
of flexibility that makes sense there, it should for maximum
performance. The cutlass library takes this to the extreme for matrix
multiplication:
https://devblogs.nvidia.com/cutlass-linear-algebra-cuda/
https://github.com/NVIDIA/cutlass . To clarify for flexibility, I
don't just mean exploiting the hierarchy via tiling but choosing
between simpler multiplication techniques given smaller dimensions,
layout, amount of shared memory desired (or registers sacrificed) and
choosing how to extract the parallelism into such evaluations.

Which means that each thread id has to do all its work individually,
this can be somewhat reasonable, depends on the problem's/kernel's
needs.

As for buidling it with cuda support, it autodetects the NVCC compiler
through the macro common definitions that compiler defines (__NVCC__
and the like). You have to explicitly disable it if you're compiling
with NVCC (I've had errors and turn it off occasionally when I'm using
eigen in nvcc on the host side).

I don't know anything about the unit tests, sorry. I also haven't
been watching for any recent changes so my experiences may also be a
little out of date.

I am not a core dev but what I have seen and used in the past for the
project is to submit PR's to https://bitbucket.org/eigen/eigen/ - I of
course leave plenty of room for any stakeholders to clarify any of the
other questions you asked.

-Jason

On Wed, May 16, 2018 at 4:52 PM, Deven Desai <deven.desai.amd@xxxxxxxxx> wrote:
> Hi All,
>
> I am a software developement engineer in AMD and we are currently working on
> enabling support AMD GPUs in Eigen.
>
>
> We envision that support for the AMD GPUs can be implemented in fashion
> similar to what has already been done for NVidia with CUDA. I have some
> initial questions w.r.t. this task:
>
>
> 1. What is the purpose of the "EIGEN_USE_GPU" macro in the codebase? I see a
> lot of code that is guarded by the EIGEN_CUDACC (guards code that uses CUDA
> extensions) and EIGEN_CUDA_ARCH (guards code that is expected to execute on
> the device) macros, which I think I understand. What I am not clear about is
> the need/use for the EIGEN_USE_GPU macro.
>
> 2. How do I configure cmake to
> - build Eigen with GPU / CUDA support?
> - enable all the unit tests that target the GPU/CUDA?
> I want to make sure that our implementation is consistent with what is
> already in place for CUDA, and hence the need to understand the CUDA
> implementation.
> Any information regarding this will be very helpful.
>
>
> 3. What is the correct protocol to use for upstreaming our code (once done)
> to the Eigen codebase? Will a simple pull request suffice, or do we need to
> do something more? Is there some acceptance criteria/checklist we need to
> complete, before we can can issue the PR?
>
>
> Please let me know if this is not the correct forum to address these
> questions (and point me to the right one :) ) I expect to have a quite a
> few more questions in the coming days, as we
>
>
> Thanks
>
> deven