Re: [eigen] A complex FFT for Eigen

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


I see. So it's more a matter of just sending them a bug report.

Still not decided what to do on our side :)

Maybe the safest is to have our own complex class under a different
name (ei_complex).

Cheers,
Benoit

2008/12/2 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>
> hm yes, patching GCC's <complex> header file sounds a good idea, but:
>
> 1 - you must enforce 16 byte alignment => overload the operator new of
> complex<double> as well as all classes that store a complex<double>... (just
> like for Eigen::Vector2d)
>     I'm not sure everybody we'll agree with that !
>
> 2 - it seems GCC's <complex> header file refers to an internal complex type
> and to internal builtins, e.g.:
>     typedef __complex__ double _ComplexT;
>     I don't know why, but I guess the initial goal was to allow more
> advanced optimization, so basically patch GCC itself rather than simply
> patching the header file.
>
> Maybe I'm completely wrong about argument 2, we should ask them about that,
> but I guess that argument 1 really implies to patch GCC itself such that the
> vectorization of complex<double> would be enabled only when the data are
> known to be aligned at compile time. Not really worth the effort !
>
> Cheers,
> gael.
>
> On Tue, Dec 2, 2008 at 5:25 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
> wrote:
>>
>> It seems to me that the optimal solution is to patch <complex> to
>> vectorize std::complex<double>.
>> What do you think?
>> I could live of the idea of saying "yeah i've submitted a patch or two
>> to GCC" in a party.
>>
>> Another option (basically, if they don't accept our patches) is to
>> have our own <complex> having the same name. If the user hasn't
>> already #included<complex> (as we can tell from the preprocessor
>> symbol guarding it) then we include our own instead, and we define all
>> possible preprocessor symbols to make sure that subsequent inclusion
>> of <complex> by the user is blocked.
>>
>> Cheers,
>> Benoit
>>
>> 2008/12/2 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>> >
>> > On Tue, Dec 2, 2008 at 4:51 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
>> > wrote:
>> >>
>> >> Yes I agree, I too prefer the 1st solution: own specialization of
>> >> complex<double>. Actually I hadn't thought of that possibility.
>> >>
>> >> Unfortunately, at least here in gcc 4.3, <complex> does specialize
>> >> complex for float and double. Not just as an external, but the actual
>> >> specialization is in the header. Which prevents us from doing our own.
>> >>
>> >> Moreover, comments refer to what seem to be sections in the C++ spec,
>> >> so this is probably standard.
>> >
>> > I see, that's a pity.
>> >
>> >> So at this point I want to say: OK so the c++ standard expects the std
>> >> lib implementors to fully take care of optimizing complex for float
>> >> and double, fine! So the next thing to do is to check the assembly
>> >> with some test program: enable SSE2 and check that the compiler really
>> >> produces vectorized code. An addition of complex<double> should be
>> >> vectorized for example. Looking at the <complex> header, there doesn't
>> >> seem to be any explicit vectorization, but perhaps they know that the
>> >> compiler does a good job here, or perhaps i don't understand what
>> >> these __real__, __imag__ pseudo keywords mean.
>> >
>> > I've just checked, and g++ does not vectorize at all complex<double> :(
>> >
>> >
>> > gael.
>> >
>> >
>> >>
>> >> Cheers,
>> >> Benoit
>> >>
>> >>
>> >> > yes of course we first need to get the vectorization of complex done
>> >> > the
>> >> > right way, and I still don't know what's the best option. We can
>> >> > either
>> >> > provide our own specialization of std::complex<double> implementation
>> >> > or
>> >> > use
>> >> > the ei_p* functions:
>> >> >
>> >> > Own specialization of std::complex<double>
>> >> >  * pro:
>> >> >    - enable its optimization everywhere
>> >> >  * cons:
>> >> >    - we and the user have to make sure a std::complex<double> is
>> >> > always
>> >> > correctly aligned (easy to do unless the user starts doing fancy
>> >> > casts)
>> >> >    - on the long term, the next gen CPUs will have 256bit registers,
>> >> > so
>> >> > a
>> >> > packet of std::complex<double> won't be made of a single coeff
>> >> > anymore...
>> >> >
>> >> > Using the ei_p* with packet of 1 coeff
>> >> >  * pro:
>> >> >    - no need to specialize std::complex<double>
>> >> >  * cons:
>> >> >    - need to enable the vectorization path (to use the ei_p*) even
>> >> > for
>> >> > packets of one element, so we need another mechanism to specify
>> >> > whether
>> >> > a
>> >> > scalar type is vectorizable or not
>> >> >    - usually, the vectorized path does not really make sens for
>> >> > packet
>> >> > size
>> >> > of 1 coeff, (on the other hand, the products are implemented on this
>> >> > model,
>> >> > so why not)
>> >> >
>> >> > all in all, currently I don't see any argument which would strongly
>> >> > kill
>> >> > one
>> >> > option in favor of the other, and so I would vote for the first one
>> >> > (own
>> >> > specialization of std::complex<double>) simply because it enables
>> >> > more
>> >> > optimizations and it is probably easier to do.
>> >> >
>> >> > gael.
>> >> >
>> >> >
>> >> >>
>> >> >> Cheers,
>> >> >> Benoit
>> >> >>
>> >> >> 2008/12/2 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>> >> >> >
>> >> >> > all of this looks great,
>> >> >> >
>> >> >> > On Mon, Dec 1, 2008 at 11:31 PM, Benoit Jacob
>> >> >> > <jacob.benoit.1@xxxxxxxxx>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> 2008/12/1 Tim Molteno <tim@xxxxxxxxxxxxxxxxxxx>:
>> >> >> >>
>> >> >> >>                for (unsigned i=0; i<N2; i++)
>> >> >> >>                {
>> >> >> >>                        std::complex<T> temp(data[i+N2]*w);
>> >> >> >>                        data[i+N2] = data[i]-temp;
>> >> >> >>                        data[i] += temp;
>> >> >> >>                        w += w*wp;
>> >> >> >>                }
>> >> >> >>
>> >> >> >> And I don't see any way to use Eigen expressions here, or to
>> >> >> >> vectorize
>> >> >> >> that loop as such (again, no contiguous access).
>> >> >> >
>> >> >> > what about the following (let's assume the size of a packet is 2):
>> >> >> >
>> >> >> > const int PacketSize = 2;
>> >> >> > Scalar _w[PacketSize];
>> >> >> > _w[0] = 0;
>> >> >> > for (int k=1; k<PacketSize; ++k)
>> >> >> >   _w[k] = _w[k-1] + _w[k-1] * wp;
>> >> >> > Packet w = ei_pload(_w);
>> >> >> > Packet wp2 = ei_pset1(wp);
>> >> >> > for (int i=0; i<N2; i+=PacketSize)
>> >> >> > {
>> >> >> >   Packet tmp = ei_pmul(ei_pload(data+i+N2), w);
>> >> >> >   Packet di = ei_pload(data+i);
>> >> >> >   ei_store(data+i+N2, ei_psub(di, tmp));
>> >> >> >   ei_store(data+i,       ei_padd(di, tmp));
>> >> >> >   w = ei_madd(w, wp2, w);
>> >> >> > }
>> >> >> >
>> >> >> > I think this should work whatever PacketSize is (including the
>> >> >> > case
>> >> >> > when
>> >> >> > the
>> >> >> > vectorization is disabled, i.e., PacketSize = 1)
>> >> >> >
>> >> >> > gael.
>> >> >> >
>> >> >>
>> >> >> ---
>> >> >>
>> >> >
>> >> >
>> >>
>> >> ---
>> >>
>> >
>> >
>>
>> ---
>>
>
>

---


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/