Re: [eigen] RFC: making a deterministic and reproducable product codepath with Eigen

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


FYI I came across reproblas a few months ago ( http://bebop.cs.berkeley.edu/reproblas/ ) and it helped me figure out a few of the different goals one can have with the different cases of determinism and reproducibility that exist - I've had pretty much all of them at some point.  The blas package is incomplete unfortunately but I've been interested in trying out a DIY C++ "scalar" for their index type and how Eigen might work with it - of course I've learned a little more about how the BLAS hookup is done in Eigen now and that might be a path to trying that out - Something like 4x memory per scalar for unlimited parallelism scalability and higher precision coming along for the ride - pretty good tradeoff - take a look at their papers sometime if the topic interests you.

One such goal is software that is obligated to always get the same results - a difficult guarantee without such tools if you don't have fixed reduction orderings, and it happens...

Apparently a reemerging topic several corporatations places are starting to take care and discuss with http://www.netlib.org/utk/people/JackDongarra/WEB-PAGES/Batched-BLAS-2016/

Part of the reason I brought this topic up is I want strong confidence that my software is correct to where it was written somewhere else (python) and testing any other way is hard/of low value.  I've seen it happen with Kalman filters/state estimation algorithms - they just work in the presence of errors introduced by the programmer (since that can be treated as noise), that is until they don't. Debugging decisions made with things that vary also can be a PITA if they aren't sufficiently reproducable - though it looks like Eigen is in the clear for this category, perhaps even for it's OpenMP path.

Also, with how the blas paths in Eigen have been coded up, it looks like they also disable lazy evaluation - or am I incorrect in that assessment?  Essentially they have to submit the entire problem at once to their appropriate routines.  Since that product path has that capability - a user defined product path could be added (without much being added to Eigen itself - probably just by clearing out via a macro the product selector specializations so the user can override), to allow whatever types of other products, perhaps like ref blas's strategy (definitely a good target along with naive), complex reduction trees, or perhaps a stable blocking based algorithm - while keeping all of Eigen's existing constructs.  Further, it might be possible to make the static dispatch reroute to a dynamic dispatch that uses some flags to control which product type is used - if the user sets that up appropriately.  Perhaps a few of these common ones could go into Unsupported modules.  Or maybe it should be approached via blas, as Gael mentioned earlier.  A short coming is with it's current strategy, it limits to the blas interfaces and float/double though - not quite decided if that's an issue (may be for integers near special cases) - then the user has to define a blas which might break/clash in larger softwares.

-Jason


On Thu, Sep 8, 2016 at 3:45 PM, Peter <list@xxxxxxxxxxxxxxxxx> wrote:
Dar Christoph,
Am 08.09.2016 um 19:06 schrieb Christoph Hertzberg:
On 2016-09-06 16:09, Peter wrote:
It's beyond my knowledge, whether scalar products will always be
scheduled in the same way on the FPUs by the hardware,
especially if the scalar product appears after an if-statement.
I'm just sceptical, but maybe I just got surprised too often.

I would be very surprised, if a CPU would give different results for the very same machine instructions (given the same inputs and FPU-configuration, of course), solely based on internal scheduling.

In case you are interested, there's e.g. HP's Dynamo project, <http://www.hpl.hp.com/techreports/1999/HPL-1999-77.html>,
which messes around with binaries. And for scalar products, it's sufficient to change the order of evaluation,
to loose bit-wise accuracy, eg. the scalar product of ( 1, 1e-50, 1) with ( 1, 1, -1 ) is a simple example.
I'm just not sure how far the processors mess around.

For the given use-case, I guess the simplest solution would be to hand-craft a simple library with naive but stable implementations.

I agree, using a F77 BLAS should be sufficient. Although I still don't understand what one learns from bypassing all optimizations.

If correctness is important one should switch to exact scalar product, like in C-XSC,
which removes the dependence on the order of evaluation and just _has_ to provide the same result everywhere.
BTW, exact scalar products could be an interesting extension to Eigen in some future version,
opening the door to verified computing.

Best regards,
Peter





Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/