|Re: [eigen] RFC: making a deterministic and reproducable product codepath with Eigen|
[ Thread Index |
| More lists.tuxfamily.org/eigen Archives
Am 06.09.2016 um 11:08 schrieb Jason Newton:
For FMA/SSE and what not, you must ensure that part is the same on both implementation, but what matters is the reduction ordering is kept the same after that part is fixed. It is one more thing like rounding that will need
attentiveness - gcc will let you escape the 80-bit register for instance by relying on SSE for everything. I was not able to understand how more than one FPU could wriggle itself inside the reduction ordering - at least on on a
GPU, I know this won't happen and there are many FPUs there.
GPUs are pretty simple, in contrast some processors have branch prediction and out-of-order execution in hardware.
It's beyond my knowledge, whether scalar products will always be scheduled in the same way on the FPUs by the hardware,
especially if the scalar product appears after an if-statement.
I'm just sceptical, but maybe I just got surprised too often.
Itanium not sure of at all but those are going by the wayside, right?
Yes it's dead, but it's a prime example where the compiler messes around a lot.
I agree it is possible, at least without managing compiler settings carefully,
o.k., with the help of the compiler one might get far.
It also might be good to have a user pluggable matrix product calculator - this would let you fiddle with the reduction ordering, to say deterministic reduction trees / different blocking/tiling configurations.
Could also be interesting to investigate performance issues.