> yes of course we first need to get the vectorization of complex done the
> right way, and I still don't know what's the best option. We can either
> provide our own specialization of std::complex<double> implementation or use
> the ei_p* functions:
>
> Own specialization of std::complex<double>
> * pro:
> - enable its optimization everywhere
> * cons:
> - we and the user have to make sure a std::complex<double> is always
> correctly aligned (easy to do unless the user starts doing fancy casts)
> - on the long term, the next gen CPUs will have 256bit registers, so a
> packet of std::complex<double> won't be made of a single coeff anymore...
>
> Using the ei_p* with packet of 1 coeff
> * pro:
> - no need to specialize std::complex<double>
> * cons:
> - need to enable the vectorization path (to use the ei_p*) even for
> packets of one element, so we need another mechanism to specify whether a
> scalar type is vectorizable or not
> - usually, the vectorized path does not really make sens for packet size
> of 1 coeff, (on the other hand, the products are implemented on this model,
> so why not)
>
> all in all, currently I don't see any argument which would strongly kill one
> option in favor of the other, and so I would vote for the first one (own
> specialization of std::complex<double>) simply because it enables more
> optimizations and it is probably easier to do.
>
> gael.
>
>
>>
>> Cheers,
>> Benoit
>>
>> 2008/12/2 Gael Guennebaud <
gael.guennebaud@xxxxxxxxx>:
>> >
>> > all of this looks great,
>> >
>> > On Mon, Dec 1, 2008 at 11:31 PM, Benoit Jacob <
jacob.benoit.1@xxxxxxxxx>
>> > wrote:
>> >>
>> >> 2008/12/1 Tim Molteno <
tim@xxxxxxxxxxxxxxxxxxx>:
>> >>
>> >> for (unsigned i=0; i<N2; i++)
>> >> {
>> >> std::complex<T> temp(data[i+N2]*w);
>> >> data[i+N2] = data[i]-temp;
>> >> data[i] += temp;
>> >> w += w*wp;
>> >> }
>> >>
>> >> And I don't see any way to use Eigen expressions here, or to vectorize
>> >> that loop as such (again, no contiguous access).
>> >
>> > what about the following (let's assume the size of a packet is 2):
>> >
>> > const int PacketSize = 2;
>> > Scalar _w[PacketSize];
>> > _w[0] = 0;
>> > for (int k=1; k<PacketSize; ++k)
>> > _w[k] = _w[k-1] + _w[k-1] * wp;
>> > Packet w = ei_pload(_w);
>> > Packet wp2 = ei_pset1(wp);
>> > for (int i=0; i<N2; i+=PacketSize)
>> > {
>> > Packet tmp = ei_pmul(ei_pload(data+i+N2), w);
>> > Packet di = ei_pload(data+i);
>> > ei_store(data+i+N2, ei_psub(di, tmp));
>> > ei_store(data+i, ei_padd(di, tmp));
>> > w = ei_madd(w, wp2, w);
>> > }
>> >
>> > I think this should work whatever PacketSize is (including the case when
>> > the
>> > vectorization is disabled, i.e., PacketSize = 1)
>> >
>> > gael.
>> >
>>
>> ---
>>
>
>