Re: [eigen] Adding support for AMD GPUs in Eigen

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

To: eigen@xxxxxxxxxxxxxxxxxxx
Subject: Re: [eigen] Adding support for AMD GPUs in Eigen
From: Jason Newton <nevion@xxxxxxxxx>
Date: Thu, 17 May 2018 02:53:06 -0700
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=4jR54qfA0i7S88Ls6itaCsX/0bUKLm7TTPCr2xhM3i4=; b=Y4ZbkzvMGS/jRUyTBVW+z73CCyVPRzzQJEPf5wBdaHGTYy8mipR5WGopdkVpVjFWaz oRWVeFBIUmSqCC9THXOVcJUBHbbEkZk9+U6gvi62iWO9KnjAn3kA27iNS+RtS7Izh9pq dSqXbF7S6u1hD5/e9wnbfyksbm5t2VuQW2VWFhjLAuljE+wgsMgi1qbLamXzMbH1ag/x 6Z7Ec1KsWqOU7i7gydmpJC0Eyfq4uSyKcHjcm8vWVAMWvYMvDFM9SKdEjrgp/qM6NG7R LVGamw/Glb+fbj8JOzGlI0yESG/L8pk70Lrbp+R9F29l0VNArQgjnwwLNNsBbazhfitv N+gg==

Just had to drop in and say cool!  It's great to see HIP support
spread through the ecosystem.

I've tried to use Eigen a few times in CUDA and I realized a few problems:

-Solvers that could execute on the GPU didn't, because of dynamic
allocations happening somewhere and I couldn't figure out how to make
that not happen.  For things like a batched-qr solve of small
matrices.  They may not have actually had happen but the problem is
they'd be referenced on the device side compile, somewhere deep.  I
think at the time I was looking at either the SVD or QR solvers.

-It wasn't as flexible as I first hoped, unfortunately there's a lot
of strategies you can use to evaluate matrix operations in warp,
block, or device level parallel and this is outside of what eigen
offers.  If it was trying to be a device side library in the capacity
of flexibility that makes sense there, it should for maximum
performance.  The cutlass library takes this to the extreme for matrix
multiplication:
https://devblogs.nvidia.com/cutlass-linear-algebra-cuda/
https://github.com/NVIDIA/cutlass .  To clarify for flexibility, I
don't just mean exploiting the hierarchy via tiling but choosing
between simpler multiplication techniques given smaller dimensions,
layout, amount of shared memory desired (or registers sacrificed) and
choosing how to extract the parallelism into such evaluations.

Which means that each thread id has to do all its work individually,
this can be somewhat reasonable, depends on the problem's/kernel's
needs.

As for buidling it with cuda support, it autodetects the NVCC compiler
through the macro common definitions that compiler defines (__NVCC__
and the like).  You have to explicitly disable it if you're compiling
with NVCC (I've had errors and turn it off occasionally when I'm using
eigen in nvcc on the host side).

I don't know anything about the unit tests, sorry.  I also haven't
been watching for any recent changes so my experiences may also be a
little out of date.

I am not a core dev but what I have seen and used in the past for the
project is to submit PR's to https://bitbucket.org/eigen/eigen/ - I of
course leave plenty of room for any stakeholders to clarify any of the
other questions you asked.

-Jason

On Wed, May 16, 2018 at 4:52 PM, Deven Desai <deven.desai.amd@xxxxxxxxx> wrote:
> Hi All,
>
> I am a software developement engineer in AMD and we are currently working on
> enabling support AMD GPUs in Eigen.
>
>
> We envision that support for the AMD GPUs can be implemented in fashion
> similar to what has already been done for NVidia with CUDA.  I have some
> initial questions w.r.t. this task:
>
>
> 1. What is the purpose of the "EIGEN_USE_GPU" macro in the codebase? I see a
> lot of code that is guarded by the EIGEN_CUDACC (guards code that uses CUDA
> extensions) and EIGEN_CUDA_ARCH (guards code that is expected to execute on
> the device) macros, which I think I understand. What I am not clear about is
> the need/use for the EIGEN_USE_GPU macro.
>
> 2. How do I configure cmake to
>    - build Eigen with GPU / CUDA support?
>    - enable all the unit tests that target the GPU/CUDA?
> I want to make sure that our implementation is consistent with what is
> already in place for CUDA, and hence the need to understand the CUDA
> implementation.
> Any information regarding this will be very helpful.
>
>
> 3. What is the correct protocol to use for upstreaming our code (once done)
> to the Eigen codebase? Will a simple pull request suffice, or do we need to
> do something more? Is there some acceptance criteria/checklist we need to
> complete, before we can can issue the PR?
>
>
> Please let me know if this is not the correct forum to address these
> questions (and point me to the right one :) )  I expect to have a quite a
> few more questions in the coming days, as we
>
>
> Thanks
>
> deven

Follow-Ups:
- Re: [eigen] Adding support for AMD GPUs in Eigen
  - From: Vincent Hui

References:
- [eigen] Adding support for AMD GPUs in Eigen
  - From: Deven Desai

Messages sorted by: [ date | thread ]
Prev by Date: [eigen] Adding support for AMD GPUs in Eigen
Next by Date: [eigen] Maximum size at compile time
Previous by thread: [eigen] Adding support for AMD GPUs in Eigen
Next by thread: Re: [eigen] Adding support for AMD GPUs in Eigen

Mail converted by MHonArc 2.6.19+

http://listengine.tuxfamily.org/