Re: [eigen] question about ploadu/pstoreu and alignment

[ Thread Index | Date Index | More Archives ]


On Thu, Sep 18, 2014 at 2:26 PM, Konstantinos Margaritis <markos@xxxxxxxxxxx> wrote:
*) VSX (via vec_vsx_ld/lxvw4x intrinsic/instruction) allows unaligned
access if the offset is a multiple of 4, that is, it correctly loads a
vector from an unaligned source, in either big or little endian, but
only if alignment is multiple of 32-bits, it doesn't work otherwise. If
alignment is not multiple of 32-bits, then lxvw4x doesn't work and I
have to implement unaligned load/store via a complicated permute in the
little-endian case. Do I have to/Should I implement a generic unaligned
load that loads from arbitrary alignment? I guess I should, but I guess
I had to ask. It would definitely save me time/effort to have to worry
only for word-aligned cases.

On all other platforms we assume at least a 4 bytes alignment, so you should be fine here.
*) the ploadu/pstoreu tests in packetmath also test the aligned case
(offset 0 in the for loop), which was how I found the above problem
actually. Are ploadu/pstoreu methods *guaranteed* to load unaligned
values, or are they supposed to work on the generic case, even if the
source/target is aligned? In case they don't then the packetmath test
has to remove the test for the zero offset. Otherwise, I would also
have to add special case in ploadu() to call pload() if alignment is
zero (I think this is redundant though and a serious performance drop).

Right, ploadu/pstoreu are to handle cases for which aligned access cannot be guaranteed, but that does not mean that the data is not aligned at all. I don't really see why this would make the implementation of ploadu more difficult/expensive though.

PS. After the VSX port, I'd like to complete NEON for the new armv8, it
seems to work already but does not yet support doubles. It's definitely
going to be easier than the VSX port though.

great :)

Mail converted by MHonArc 2.6.19+