Re: [eigen] (co-)mentoring for Google Summer of Code

On Wed, Feb 11, 2015 at 2:46 PM, Gael Guennebaud <gael.guennebaud@xxxxxxxxx> wrote:

On Wed, Feb 11, 2015 at 12:02 PM, Schleimer, Ben <bensch128@xxxxxxxxx> wrote:
Hi Gael,
I believe that once a OpenCL kernel is built, it can be reused as many times as you want. I was under the impression that the device keeps the kernel on the device memory but I could be wrong.

My problem was to find a way to store the generated kernels and then be able to find them. Perhaps a static variable of a function templated by the whole _expression_ type will do. Then at the first run of an _expression_, we still have to generate the kernel source code from the _expression_ tree, uniquely name the leaves (i.e., the matrices), and compile the kernel.

You can look at how Boost.Compute does it - sort of a mix of ahead-of-time and JIT but you've hit the nail on the head with implicit generation from templates. Sycl [ https://www.khronos.org/opencl/sycl ] will address this but as of now there's nothing to work with it.

Also, the C++ wrapper for the OpenCL buffer object is fairly straight forward... read data to device, run the kernel, write data from device...

Sure, but if you want to avoid useless device-to-host followed host-to-device copies and keep data on the GPU when possible, then you have to ask the user to manually trigger the copies.

Its not clear to me why CUDA would pull ahead here. Everything you can do with memory in CUDA you can do with OpenCL too. Managing the run-time of things is of similar complexity on both (although kernel dispatch is a tad more simpler on CUDA - the C++ OpenCL bindings simplify that though).

-Jason