Re: [eigen] A complex FFT for Eigen |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] A complex FFT for Eigen
- From: "Benoit Jacob" <jacob.benoit.1@xxxxxxxxx>
- Date: Tue, 2 Dec 2008 17:55:43 +0100
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=tunMkL9nry8JSoHsjlZiQBxb8tTG3oeQb0g1pPqJIhY=; b=qIRoQJXPYErnG/3K3Pz6OtnsNnEQ8HT2e5jQxv+r4IF/1FcBE2JHwib3UiqUoqgzp7 Y5t9+UpRPa8uN6Ubm3NMrkQFv0uCUC4WdWWkrwbxN5eUDmKBs94b0Tw/GydV30EtAL/P RGLK5eN6jistFWyPnNbsydJrAcOmXT9xzhBJQ=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=BFVALs2rdY3Nju7Vnn5L3i6/WulDFLy+US006B7bDYkBpxwzBd6TfAvB1KjsYec4wz GEmR+jZYxLjYCf0PiAXNyFC6WV4A+gL0aGmt/5dV1UBWovZ0NGaELPykFuiXS5diOD4y jmt+eerPicDu91aU2UuG9gqkMgngNnRNzBDTI=
I see. So it's more a matter of just sending them a bug report.
Still not decided what to do on our side :)
Maybe the safest is to have our own complex class under a different
name (ei_complex).
Cheers,
Benoit
2008/12/2 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>
> hm yes, patching GCC's <complex> header file sounds a good idea, but:
>
> 1 - you must enforce 16 byte alignment => overload the operator new of
> complex<double> as well as all classes that store a complex<double>... (just
> like for Eigen::Vector2d)
> I'm not sure everybody we'll agree with that !
>
> 2 - it seems GCC's <complex> header file refers to an internal complex type
> and to internal builtins, e.g.:
> typedef __complex__ double _ComplexT;
> I don't know why, but I guess the initial goal was to allow more
> advanced optimization, so basically patch GCC itself rather than simply
> patching the header file.
>
> Maybe I'm completely wrong about argument 2, we should ask them about that,
> but I guess that argument 1 really implies to patch GCC itself such that the
> vectorization of complex<double> would be enabled only when the data are
> known to be aligned at compile time. Not really worth the effort !
>
> Cheers,
> gael.
>
> On Tue, Dec 2, 2008 at 5:25 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
> wrote:
>>
>> It seems to me that the optimal solution is to patch <complex> to
>> vectorize std::complex<double>.
>> What do you think?
>> I could live of the idea of saying "yeah i've submitted a patch or two
>> to GCC" in a party.
>>
>> Another option (basically, if they don't accept our patches) is to
>> have our own <complex> having the same name. If the user hasn't
>> already #included<complex> (as we can tell from the preprocessor
>> symbol guarding it) then we include our own instead, and we define all
>> possible preprocessor symbols to make sure that subsequent inclusion
>> of <complex> by the user is blocked.
>>
>> Cheers,
>> Benoit
>>
>> 2008/12/2 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>> >
>> > On Tue, Dec 2, 2008 at 4:51 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
>> > wrote:
>> >>
>> >> Yes I agree, I too prefer the 1st solution: own specialization of
>> >> complex<double>. Actually I hadn't thought of that possibility.
>> >>
>> >> Unfortunately, at least here in gcc 4.3, <complex> does specialize
>> >> complex for float and double. Not just as an external, but the actual
>> >> specialization is in the header. Which prevents us from doing our own.
>> >>
>> >> Moreover, comments refer to what seem to be sections in the C++ spec,
>> >> so this is probably standard.
>> >
>> > I see, that's a pity.
>> >
>> >> So at this point I want to say: OK so the c++ standard expects the std
>> >> lib implementors to fully take care of optimizing complex for float
>> >> and double, fine! So the next thing to do is to check the assembly
>> >> with some test program: enable SSE2 and check that the compiler really
>> >> produces vectorized code. An addition of complex<double> should be
>> >> vectorized for example. Looking at the <complex> header, there doesn't
>> >> seem to be any explicit vectorization, but perhaps they know that the
>> >> compiler does a good job here, or perhaps i don't understand what
>> >> these __real__, __imag__ pseudo keywords mean.
>> >
>> > I've just checked, and g++ does not vectorize at all complex<double> :(
>> >
>> >
>> > gael.
>> >
>> >
>> >>
>> >> Cheers,
>> >> Benoit
>> >>
>> >>
>> >> > yes of course we first need to get the vectorization of complex done
>> >> > the
>> >> > right way, and I still don't know what's the best option. We can
>> >> > either
>> >> > provide our own specialization of std::complex<double> implementation
>> >> > or
>> >> > use
>> >> > the ei_p* functions:
>> >> >
>> >> > Own specialization of std::complex<double>
>> >> > * pro:
>> >> > - enable its optimization everywhere
>> >> > * cons:
>> >> > - we and the user have to make sure a std::complex<double> is
>> >> > always
>> >> > correctly aligned (easy to do unless the user starts doing fancy
>> >> > casts)
>> >> > - on the long term, the next gen CPUs will have 256bit registers,
>> >> > so
>> >> > a
>> >> > packet of std::complex<double> won't be made of a single coeff
>> >> > anymore...
>> >> >
>> >> > Using the ei_p* with packet of 1 coeff
>> >> > * pro:
>> >> > - no need to specialize std::complex<double>
>> >> > * cons:
>> >> > - need to enable the vectorization path (to use the ei_p*) even
>> >> > for
>> >> > packets of one element, so we need another mechanism to specify
>> >> > whether
>> >> > a
>> >> > scalar type is vectorizable or not
>> >> > - usually, the vectorized path does not really make sens for
>> >> > packet
>> >> > size
>> >> > of 1 coeff, (on the other hand, the products are implemented on this
>> >> > model,
>> >> > so why not)
>> >> >
>> >> > all in all, currently I don't see any argument which would strongly
>> >> > kill
>> >> > one
>> >> > option in favor of the other, and so I would vote for the first one
>> >> > (own
>> >> > specialization of std::complex<double>) simply because it enables
>> >> > more
>> >> > optimizations and it is probably easier to do.
>> >> >
>> >> > gael.
>> >> >
>> >> >
>> >> >>
>> >> >> Cheers,
>> >> >> Benoit
>> >> >>
>> >> >> 2008/12/2 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>> >> >> >
>> >> >> > all of this looks great,
>> >> >> >
>> >> >> > On Mon, Dec 1, 2008 at 11:31 PM, Benoit Jacob
>> >> >> > <jacob.benoit.1@xxxxxxxxx>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> 2008/12/1 Tim Molteno <tim@xxxxxxxxxxxxxxxxxxx>:
>> >> >> >>
>> >> >> >> for (unsigned i=0; i<N2; i++)
>> >> >> >> {
>> >> >> >> std::complex<T> temp(data[i+N2]*w);
>> >> >> >> data[i+N2] = data[i]-temp;
>> >> >> >> data[i] += temp;
>> >> >> >> w += w*wp;
>> >> >> >> }
>> >> >> >>
>> >> >> >> And I don't see any way to use Eigen expressions here, or to
>> >> >> >> vectorize
>> >> >> >> that loop as such (again, no contiguous access).
>> >> >> >
>> >> >> > what about the following (let's assume the size of a packet is 2):
>> >> >> >
>> >> >> > const int PacketSize = 2;
>> >> >> > Scalar _w[PacketSize];
>> >> >> > _w[0] = 0;
>> >> >> > for (int k=1; k<PacketSize; ++k)
>> >> >> > _w[k] = _w[k-1] + _w[k-1] * wp;
>> >> >> > Packet w = ei_pload(_w);
>> >> >> > Packet wp2 = ei_pset1(wp);
>> >> >> > for (int i=0; i<N2; i+=PacketSize)
>> >> >> > {
>> >> >> > Packet tmp = ei_pmul(ei_pload(data+i+N2), w);
>> >> >> > Packet di = ei_pload(data+i);
>> >> >> > ei_store(data+i+N2, ei_psub(di, tmp));
>> >> >> > ei_store(data+i, ei_padd(di, tmp));
>> >> >> > w = ei_madd(w, wp2, w);
>> >> >> > }
>> >> >> >
>> >> >> > I think this should work whatever PacketSize is (including the
>> >> >> > case
>> >> >> > when
>> >> >> > the
>> >> >> > vectorization is disabled, i.e., PacketSize = 1)
>> >> >> >
>> >> >> > gael.
>> >> >> >
>> >> >>
>> >> >> ---
>> >> >>
>> >> >
>> >> >
>> >>
>> >> ---
>> >>
>> >
>> >
>>
>> ---
>>
>
>
---