>
> all of this looks great,
>
> On Mon, Dec 1, 2008 at 11:31 PM, Benoit Jacob <
jacob.benoit.1@xxxxxxxxx>
> wrote:
>>
>> 2008/12/1 Tim Molteno <
tim@xxxxxxxxxxxxxxxxxxx>:
>>
>> for (unsigned i=0; i<N2; i++)
>> {
>> std::complex<T> temp(data[i+N2]*w);
>> data[i+N2] = data[i]-temp;
>> data[i] += temp;
>> w += w*wp;
>> }
>>
>> And I don't see any way to use Eigen expressions here, or to vectorize
>> that loop as such (again, no contiguous access).
>
> what about the following (let's assume the size of a packet is 2):
>
> const int PacketSize = 2;
> Scalar _w[PacketSize];
> _w[0] = 0;
> for (int k=1; k<PacketSize; ++k)
> _w[k] = _w[k-1] + _w[k-1] * wp;
> Packet w = ei_pload(_w);
> Packet wp2 = ei_pset1(wp);
> for (int i=0; i<N2; i+=PacketSize)
> {
> Packet tmp = ei_pmul(ei_pload(data+i+N2), w);
> Packet di = ei_pload(data+i);
> ei_store(data+i+N2, ei_psub(di, tmp));
> ei_store(data+i, ei_padd(di, tmp));
> w = ei_madd(w, wp2, w);
> }
>
> I think this should work whatever PacketSize is (including the case when the
> vectorization is disabled, i.e., PacketSize = 1)
>
> gael.
>