Hi Gael,
I believe that once a OpenCL kernel is built, it can be reused as many times as you want. I was under the impression that the device keeps the kernel on the device memory but I could be wrong.
Also, the C++ wrapper for the OpenCL buffer object is fairly straight forward... read data to device, run the kernel, write data from device...
Sigh, I would volunteer to do this but I'm too busy right now...
so I'm going to shut up now.
Ben
On Wednesday, February 11, 2015 12:05 AM, Gael Guennebaud <gael.guennebaud@xxxxxxxxx> wrote:
Hi,
unfortunately handling OpenCL is much more complicated as you have to generate and compile the code at runtime, and if the _expression_ is within a loop you would also like to cache the compiled code somehow. You would also have to deal with memory transfers while CUDA can do it for us.