Re: [eigen] A complex FFT for Eigen

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]



hm yes, patching GCC's <complex> header file sounds a good idea, but:

1 - you must enforce 16 byte alignment => overload the operator new of complex<double> as well as all classes that store a complex<double>... (just like for Eigen::Vector2d)
    I'm not sure everybody we'll agree with that !

2 - it seems GCC's <complex> header file refers to an internal complex type and to internal builtins, e.g.:
    typedef __complex__ double _ComplexT;
    I don't know why, but I guess the initial goal was to allow more advanced optimization, so basically patch GCC itself rather than simply patching the header file.

Maybe I'm completely wrong about argument 2, we should ask them about that, but I guess that argument 1 really implies to patch GCC itself such that the vectorization of complex<double> would be enabled only when the data are known to be aligned at compile time. Not really worth the effort !

Cheers,
gael.

On Tue, Dec 2, 2008 at 5:25 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
It seems to me that the optimal solution is to patch <complex> to
vectorize std::complex<double>.
What do you think?
I could live of the idea of saying "yeah i've submitted a patch or two
to GCC" in a party.

Another option (basically, if they don't accept our patches) is to
have our own <complex> having the same name. If the user hasn't
already #included<complex> (as we can tell from the preprocessor
symbol guarding it) then we include our own instead, and we define all
possible preprocessor symbols to make sure that subsequent inclusion
of <complex> by the user is blocked.

Cheers,
Benoit

2008/12/2 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>
> On Tue, Dec 2, 2008 at 4:51 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
> wrote:
>>
>> Yes I agree, I too prefer the 1st solution: own specialization of
>> complex<double>. Actually I hadn't thought of that possibility.
>>
>> Unfortunately, at least here in gcc 4.3, <complex> does specialize
>> complex for float and double. Not just as an external, but the actual
>> specialization is in the header. Which prevents us from doing our own.
>>
>> Moreover, comments refer to what seem to be sections in the C++ spec,
>> so this is probably standard.
>
> I see, that's a pity.
>
>> So at this point I want to say: OK so the c++ standard expects the std
>> lib implementors to fully take care of optimizing complex for float
>> and double, fine! So the next thing to do is to check the assembly
>> with some test program: enable SSE2 and check that the compiler really
>> produces vectorized code. An addition of complex<double> should be
>> vectorized for example. Looking at the <complex> header, there doesn't
>> seem to be any explicit vectorization, but perhaps they know that the
>> compiler does a good job here, or perhaps i don't understand what
>> these __real__, __imag__ pseudo keywords mean.
>
> I've just checked, and g++ does not vectorize at all complex<double> :(
>
>
> gael.
>
>
>>
>> Cheers,
>> Benoit
>>
>>
>> > yes of course we first need to get the vectorization of complex done the
>> > right way, and I still don't know what's the best option. We can either
>> > provide our own specialization of std::complex<double> implementation or
>> > use
>> > the ei_p* functions:
>> >
>> > Own specialization of std::complex<double>
>> >  * pro:
>> >    - enable its optimization everywhere
>> >  * cons:
>> >    - we and the user have to make sure a std::complex<double> is always
>> > correctly aligned (easy to do unless the user starts doing fancy casts)
>> >    - on the long term, the next gen CPUs will have 256bit registers, so
>> > a
>> > packet of std::complex<double> won't be made of a single coeff
>> > anymore...
>> >
>> > Using the ei_p* with packet of 1 coeff
>> >  * pro:
>> >    - no need to specialize std::complex<double>
>> >  * cons:
>> >    - need to enable the vectorization path (to use the ei_p*) even for
>> > packets of one element, so we need another mechanism to specify whether
>> > a
>> > scalar type is vectorizable or not
>> >    - usually, the vectorized path does not really make sens for packet
>> > size
>> > of 1 coeff, (on the other hand, the products are implemented on this
>> > model,
>> > so why not)
>> >
>> > all in all, currently I don't see any argument which would strongly kill
>> > one
>> > option in favor of the other, and so I would vote for the first one (own
>> > specialization of std::complex<double>) simply because it enables more
>> > optimizations and it is probably easier to do.
>> >
>> > gael.
>> >
>> >
>> >>
>> >> Cheers,
>> >> Benoit
>> >>
>> >> 2008/12/2 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>> >> >
>> >> > all of this looks great,
>> >> >
>> >> > On Mon, Dec 1, 2008 at 11:31 PM, Benoit Jacob
>> >> > <jacob.benoit.1@xxxxxxxxx>
>> >> > wrote:
>> >> >>
>> >> >> 2008/12/1 Tim Molteno <tim@xxxxxxxxxxxxxxxxxxx>:
>> >> >>
>> >> >>                for (unsigned i=0; i<N2; i++)
>> >> >>                {
>> >> >>                        std::complex<T> temp(data[i+N2]*w);
>> >> >>                        data[i+N2] = data[i]-temp;
>> >> >>                        data[i] += temp;
>> >> >>                        w += w*wp;
>> >> >>                }
>> >> >>
>> >> >> And I don't see any way to use Eigen expressions here, or to
>> >> >> vectorize
>> >> >> that loop as such (again, no contiguous access).
>> >> >
>> >> > what about the following (let's assume the size of a packet is 2):
>> >> >
>> >> > const int PacketSize = 2;
>> >> > Scalar _w[PacketSize];
>> >> > _w[0] = 0;
>> >> > for (int k=1; k<PacketSize; ++k)
>> >> >   _w[k] = _w[k-1] + _w[k-1] * wp;
>> >> > Packet w = ei_pload(_w);
>> >> > Packet wp2 = ei_pset1(wp);
>> >> > for (int i=0; i<N2; i+=PacketSize)
>> >> > {
>> >> >   Packet tmp = ei_pmul(ei_pload(data+i+N2), w);
>> >> >   Packet di = ei_pload(data+i);
>> >> >   ei_store(data+i+N2, ei_psub(di, tmp));
>> >> >   ei_store(data+i,       ei_padd(di, tmp));
>> >> >   w = ei_madd(w, wp2, w);
>> >> > }
>> >> >
>> >> > I think this should work whatever PacketSize is (including the case
>> >> > when
>> >> > the
>> >> > vectorization is disabled, i.e., PacketSize = 1)
>> >> >
>> >> > gael.
>> >> >
>> >>
>> >> ---
>> >>
>> >
>> >
>>
>> ---
>>
>
>

---




Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/