Re: [eigen] Status of building Eigen with Emscripten (running Eigen code

2016-02-10 16:23 GMT-05:00 Benoit Jacob <jacob.benoit.1@xxxxxxxxxx>:

Indeed, both the the things that you mention would be very important improvements, but I am not aware of any plan at the moment:

64bit vs. 32bit is in principle invisible to _javascript_ (since there are no pointers), but Emscripten has to pretend that it is a particular arch when compiling existing C++ code, and at the moment it only pretends to be a 32bit arch. For most code, there is little benefit in having Emscripten pretending to be a 64bit arch, but of course, for us it would be very nice as that would mean more registers.

Actually, we could get the benefits of more registers on Emscripten just by tweaking the values of EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS. The problem is that we wouldn't know at compile time how many registers the client machine will have. But we could compile Eigen code twice, and have the Web application choose the right version at runtime, like we do in native code when detecting e.g. AVX.

I'll draw Emscripten developers' attention to this point. I suppose that a major hurdle to overcome, is that _javascript_ doesn't have a native uint64 number type, so handling of 64bit pointer arithmetic would be inefficient. As a result, I guess, all pointer arithmetic would be a lot more inefficient. But that's just my uninformed guess. See comments on 64-bit integers in https://github.com/kripken/emscripten/wiki/Code-Generation-Modes

AVX/FMA would be very nice, but at the moment, it looks like the browser world is still trying to ship "any SIMD at all", with the initial target of SSE2 still being in the future: it's only efficiently supported in Firefox Nightly at the moment. So it's probably a matter of, "one thing at a time". There might (or might not) also be a typical concern in the Web world, of introducing compatibility issues when some new feature is only well-supported, or only fast, on a subset of client machines. That kind of concern generally makes the browser world err on the side of being conservative. Which is a good thing in general, just not for us!

Cheers,
Benoit

2016-02-10 16:12 GMT-05:00 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
Hi,

thanks for the benchmark result. Those numbers are pretty impressive. Do you know if there is any plan to support 64bits mode with AVX/FMA? Because then the native version should be at about 80GFlops.

gael

On Wed, Feb 10, 2016 at 4:59 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
Hi List,

TL;DR: Emscripten currently allows to run scalar Eigen code at ~40% of native speed in multiple browsers. SIMD support makes this much better in supporting browsers, but that doesn't include any current stable shipping browser; In current stable browsers, SIMD makes things much *worse*.

I just took another look at running Eigen MatrixXf multiplications in the browser, here is what I found.

Emscripten is now very easy to get started with. Compiling the attached testcase is as easy as:

em++ ~/vrac/eigen-benchmark.cc -I $HOME/eigen -O3 --std=c++11 -Wextra -s TOTAL_MEMORY=30000000 -o eigen-benchmark.html

That is, aside from specifying the memory size or growth policy, there is nothing particular to do. You can then simply point your browser to the resulting eigen-benchmark.html.

I was interested in performance, and in the status of SIMD.

By default, Emscripten emulates a 32-bit arch with no SIMD. For 1024x1024 MatrixXf multiplication, I get:

Native with -m32 -mno-sse: 6.0 GFlop/s
Emscripten'd code in Firefox: 2.6 GFlop/s
Emscripten'd code in Chrome: 2.2 GFlop/s

So we're at roughly 40% of native performance with plain scalar code.

Next, I was interested in SIMD status. Emscripten is gaining the ability to target SIMD.js, simply by passing -msse2 as usual. Unfortunately, this seems to be only supported in Firefox Nightly at the moment, with other browsers at the "intent to implement" stage according to Mozilla documentation. Emscripten generates a polyfill so that SIMD code still "works" everywhere, but that fallback is very, very slow.

Results with SSE2:
Native with -m32 -msse2: 20 GFlop/s
Native with -m64 -msse2: 25 GFlop/s
Emscripten'd code in Firefox Nightly: 11.8 GFlop/s
Emscripten'd code in stable Firefox: 0.0015 GFlop/s
Emscripten'd code in stable Chrome: did not complete benchmark

So the good news is that when SIMD.js is supported (in Firefox Nightly), it runs at 60% of native speed (since we should compare to -m32). The bad news is that enabling SIMD makes things unbearably slow when the fallback is used.

Emscripten bug to track for making the SIMD fallback better: issue 3783

Cheers,
Benoit