Re: [eigen] A complex FFT for Eigen

On Tue, Dec 2, 2008 at 2:10 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:

Arf good job! One typo, it's ei_pstore instead of ei_store, and
ei_pmadd instead of ei_madd (just in case Tim tries this code).

thanks for the fixes !

One thing though, is it premature optimization to write it in such a
lowlevel way even before we have global vectorization support for
complex numbers... I mean, your code is perfect for complex<float> but
leaves complex<double> outside in the cold. Maybe it'd be easier to
get a solution that works for both once we have global vectorization
support for complex. Maybe the first thing to do would be to add
specialisations of PacketMath.h for complex types, so we don't even
need to have our own complex class.

yes of course we first need to get the vectorization of complex done the right way, and I still don't know what's the best option. We can either provide our own specialization of std::complex<double> implementation or use the ei_p* functions:

Own specialization of std::complex<double>
* pro:
   - enable its optimization everywhere
* cons:
   - we and the user have to make sure a std::complex<double> is always correctly aligned (easy to do unless the user starts doing fancy casts)
   - on the long term, the next gen CPUs will have 256bit registers, so a packet of std::complex<double> won't be made of a single coeff anymore...

Using the ei_p* with packet of 1 coeff
* pro:
   - no need to specialize std::complex<double>
* cons:
   - need to enable the vectorization path (to use the ei_p*) even for packets of one element, so we need another mechanism to specify whether a scalar type is vectorizable or not
   - usually, the vectorized path does not really make sens for packet size of 1 coeff, (on the other hand, the products are implemented on this model, so why not)

all in all, currently I don't see any argument which would strongly kill one option in favor of the other, and so I would vote for the first one (own specialization of std::complex<double>) simply because it enables more optimizations and it is probably easier to do.

gael.

Cheers,
Benoit

2008/12/2 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:

>
> all of this looks great,
>
> On Mon, Dec 1, 2008 at 11:31 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
> wrote:
>>
>> 2008/12/1 Tim Molteno <tim@xxxxxxxxxxxxxxxxxxx>:
>>
>> for (unsigned i=0; i<N2; i++)
>> {
>> std::complex<T> temp(data[i+N2]*w);
>> data[i+N2] = data[i]-temp;
>> data[i] += temp;
>> w += w*wp;
>> }
>>
>> And I don't see any way to use Eigen expressions here, or to vectorize
>> that loop as such (again, no contiguous access).
>
> what about the following (let's assume the size of a packet is 2):
>
> const int PacketSize = 2;
> Scalar _w[PacketSize];
> _w[0] = 0;
> for (int k=1; k<PacketSize; ++k)
> _w[k] = _w[k-1] + _w[k-1] * wp;
> Packet w = ei_pload(_w);
> Packet wp2 = ei_pset1(wp);
> for (int i=0; i<N2; i+=PacketSize)
> {
> Packet tmp = ei_pmul(ei_pload(data+i+N2), w);
> Packet di = ei_pload(data+i);
> ei_store(data+i+N2, ei_psub(di, tmp));
> ei_store(data+i, ei_padd(di, tmp));
> w = ei_madd(w, wp2, w);
> }
>
> I think this should work whatever PacketSize is (including the case when the
> vectorization is disabled, i.e., PacketSize = 1)
>
> gael.
>

---