|[eigen] Using Eigen in CUDA kernels|
[ Thread Index |
| More lists.tuxfamily.org/eigen Archives
- To: eigen <eigen@xxxxxxxxxxxxxxxxxxx>
- Subject: [eigen] Using Eigen in CUDA kernels
- From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
- Date: Thu, 7 Feb 2013 19:21:58 +0100
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:from:date:message-id:subject:to :content-type; bh=gYxDt+6fzX/1BWxaS5Bf1/+VN2einGzMN66ZYUtdixA=; b=dJl3MQAf+kqChhsqR43opOeRSU/g2FPdJt4sGb7mt7aYO7O4TEz5ED4BT3hF6OTXN0 XDu+xgHlGOmeL63FPtzv8fq/50GXJi9KpuIJu+cKQYnW3OVzriq0UP3JJX72FevFU3fU 0Vo/16Qz7zFNd6fHIsljvE3fP5w933or9k0rhQfY69IRHTpRaSkMesJdiMjTyhgoXraJ fM0rviTKfXjk6QFKs2xvu9XP7lLU2A2+FjLuzWf222cCMaQLryKsNwPceiwPhC0fl2KG oBuc245aigk61AI47SugV584OoOycV3qsd+K+n+5WN4pUFGPra+umDHwXmm9OozD+iyG OsKA==
good news, NVCC (the CUDA compiler from NVidia) version 5 is now able
to parse and compile Eigen :) That means using Eigen in our cuda
kernels is "just" a matter of qualifying all our functions callable
from CUDA by __device__ when compiled with nvcc. As a proof of concept
I've created a fork there:
This port is far to be complete, but I've been able to do stupid stuff like:
Vector3f v = M.inverse() * Map<Vector3f>(data+tid*3);
output[tid] = v.cwiseAbs().maxCoeff();
where data1, data, and output are device buffers. The generated
assembly looked reasonable.
So no more crappy home-made fixed-size matrix and vector classes!
This also means that evaluating very large *coefficient-wise
expressions* on CUDA should not be very difficult. We just need a
special DeviceMatrix storing its data in device memory, and specialize
its assignment operator to call a CUDA kernel templated with the
destination and source expression...