Re: [eigen] Intermediate Packet Storage |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] Intermediate Packet Storage
- From: Christoph Hertzberg <chtz@xxxxxxxxxxxxxxxxxxxxxxxx>
- Date: Wed, 18 Dec 2019 15:19:46 +0100
- Dkim-signature: v=1; a=rsa-sha256; c=simple/simple; d=uni-bremen.de; s=2019; t=1576678787; i=@uni-bremen.de; bh=IliUfVGDJAeQUuPubS3wRWTfdyM+QF5VvZKqAHYPP5k=; h=To:References:From:Date:In-Reply-To; b=nSSWy6iPjam5tvq0fv4LzRfvTfb0Bx8Q4rFwnHw9tSWa9B5hTP7YN2FACeJzWPb5k cdM21Al+wgIV0AtZY1iKRHl2+nXYyVkBfES75ocTk0Fyv41jfWl8p88AB/Hcdu1chW 4A2NeIeVBh5GObhKeMIvorFxjNqE1zylcgfesQ/xFV1AcHIQ9jTC63rBih2SMBcoNH 4tECUS9QsoSSk4pk9spKFiS95EkNk21xfq9W2G9INrU0LeRT+NgoPQGtEPJIgL40XD b80y8sV5ufpPzDvCGyl61tQSDvxy1hsCGdialOUpdQUhz9n+asTLMyPvZF1/Vko/Oa RI1/tjXc1V2oQ==
On 18/12/2019 14.45, Joel Holdsworth wrote:
On 12/18/19 1:08 PM, Christoph Hertzberg wrote:
On 18/12/2019 12.20, Joel Holdsworth wrote:
[...]
const Packet<uint32_t> E = a * b; // Some expensive calculation
const Packet<uint32_t> x = E + c;
const Packet<uint32_t> y = E + c;
This results in the value of E being stored onto the stack, and then
reloaded twice to calculate x and y.
But of course, I just want E to stay in register.
I'm pretty sure that as long as `E` fits into a (set of) register(s)
no reasonable compiler will store this on the stack, unless it
actually runs out of register space, see e.g.:
https://godbolt.org/z/WxEeNF
I don't know if you think ARM GCC 5.4 counts as reasonable, but you can
see the problem occuring here: https://godbolt.org/z/fpvqat
Hm, interesting/unfortunate ...
GCC seems to store and immediately reload what it stored (even after
removing the ASM_COMMENT line, or after replacing the calculations by
much simpler calculations
https://godbolt.org/z/qXmm2i
I'm no ARM expert, but I assume {d{2k}-d{2k+1}} is an alias for `q{k}`
As I mentioned, my project requires GCC 5. I would be interested to know
if newer versions of ARM GCC have the same issue - but there seems to be
some issue with Eigen on newer versions, because godbolt is giving me
errors.
Yes, I have no idea about what causes this -- maybe some ARM expert can
chip in.
Interestingly, x86 GCC 5.4 seems to do the right thing.
Even if the small intermediate was stored on the stack, I assume the
overhead should be negligible.
It's all just cycles that I'd like to eliminate.
My algorithm has enough cross-linking in the overall evaluation graph,
that the loads and stores account for ~30% of all my instructions when
you include the extra instructions needed to calculate the stack-pointer
addresses.
Ok, fair enough.
The problem is different, if you would want to apply your expressions
at once on a set of large arrays. Something like the following will
very likely require `E` getting stored or evaluated twice (unless the
compiler is really smart detecting duplicated code or load after store).
ArrayXi a,b,c,d; // input from somewhere
ArrayXi E = a*b; // some expensive operations
ArrayXi x = E+c, y=E+d;
For solving that problem you may be interested in:
https://gitlab.com/libeigen/eigen/issues/984
That certainly seems like a related concept. But is there much prospect
of this getting implemented any time soon?
No promises when (or if) this will be finished. But it seems this would
not directly fix your immediate issue anyway.
Here are some examples of things I would like to do in a single
evaluation pass:
auto alpha = ...
blend = alpha * x + (1.0f - alpha) * y;
limit = (x > 0.f).select(log(x), 0);
auto condition = X(...) && Y(...) && Z(...);
result1 = condition.select(a, b);
result1 = condition.select(c, d);
At the moment, there is a choice between having an ArrayX intermediate,
which will incur a lot of RAM bandwidth and cache eviction, or
calculating the intermediate value twice.
Are you able to implement the above (or something similar) with pure
intrinsics? (In a way which gcc5 properly inlines, without storing
intermediates)
If that is not possible, I'd see no way at all to do this with that
compiler. If it is possible, I'd see some hope in implementing the
previously mentioned Meta-Packets.
Cheers,
Christoph
Thanks for your advice.
Joel
--
Dr.-Ing. Christoph Hertzberg
Besuchsadresse der Nebengeschäftsstelle:
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 5
28359 Bremen, Germany
Postadresse der Hauptgeschäftsstelle Standort Bremen:
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 1
28359 Bremen, Germany
Tel.: +49 421 178 45-4021
Zentrale: +49 421 178 45-0
E-Mail: christoph.hertzberg@xxxxxxx
Weitere Informationen: http://www.dfki.de/robotik
-------------------------------------------------------------
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH
Trippstadter Straße 122, D-67663 Kaiserslautern, Germany
Geschäftsführung:
Prof. Dr. Antonio Krüger (Vorsitzender)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats:
Dr. Gabriël Clemens
Amtsgericht Kaiserslautern, HRB 2313
-------------------------------------------------------------