Re: [eigen] No vectorization in presence of .cast<T>() calls

[ Thread Index | Date Index | More Archives ]

On 17.12.2010 09:58, Hauke Heibel wrote:
> On Fri, Dec 17, 2010 at 9:48 AM, Christoph Hertzberg <
> chtz@xxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>> I think you need to mark the map as aligned (if it is aligned).
>> Besides that, wouldn't it be useful sometimes, if non-aligned data get's
>> also vectorized, e.g. if the load-overhead is smaller than the
>> performance gain? (Or is there some kind of cast possible already?)
> No no, that is not the issue. It does not matter whether the map is aligned
> or not. The advantage of aligned maps is that it is possible to use aligned
> loads which are much faster than the unaligned ones. It does not affect
> vectorization itself.

Ok, that was just an uneducated guess. Next guess: Could it be that
currently unsigned char is not vectorized at all? I browsed the source a
bit and just found packet4i, packet4f and packet2d.
Have you tried if ArrayXXf::cast<double>() etc gets vectorized?

> For the moment I am just changing the code to
> MatrixXf od_full = img_mat.cast<float>();
> od_full = - std::log( od_full.array() / 255.0f +
> std::numeric_limits<float>::epsilon() );
> which leads to no additional temporary and vectorization in the second line.

I think you still lose a bit here with the unnecessary assignment.
Optimal would be something like:
* Unpack 4 bytes of memory to register taking care of unsignedness
  (I don't know the instruction for that, but I'm sure there is one or
  a combination of some),
* Convert it (_mm_cvtepi32_ps) continue working with that and
* Store register to memory.

And actually, I guess that in this particular case you would be much
faster with a look-up-table (just 256 floats == 1k) -- but of course
there might be cases when this not applicable anymore.


Dipl.-Inf. Christoph Hertzberg
Cartesium 0.051
Universität Bremen
Enrique-Schmidt-Straße 5
28359 Bremen

Tel: (+49) 421-218-64252

Mail converted by MHonArc 2.6.19+