Re: [eigen] A complex FFT for Eigen |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] A complex FFT for Eigen
- From: "Benoit Jacob" <jacob.benoit.1@xxxxxxxxx>
- Date: Tue, 2 Dec 2008 16:51:00 +0100
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=ReG/xqGokYtUv+2u84kwEevt7MPIp26u5TcaAGP6rZc=; b=EU2+rH+i5U69m88vi8a3/OE3bHfpRDNV1wgp88f365c50IXVW00c8bIYh+L6Voxz0U yBAY2s7V36lW6tttDqIjZTVCyobcUQuhPle+Odv6KWUKVvVhBKz7QhrNUBzCArDqISwW JQOY8kKsmvb1h4mTQohK2ZgZzQFDAC8tR2loY=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=xDS9LXEk1jvK9It5sHRw+WjSxPIxuWLs+6pbBMr2li4/KZnXNZwokZ5Vgju1vi16vJ NQH58gfZHrLyaGWNVlnMWOXlXkagbIZ84FOr1CBCHPpJw+rLkY8o9P1bkjaCkjPnbusk r4HeHtn9lPLxUo8UWoSb9I3QN42J4nW7KJQMk=
Yes I agree, I too prefer the 1st solution: own specialization of
complex<double>. Actually I hadn't thought of that possibility.
Unfortunately, at least here in gcc 4.3, <complex> does specialize
complex for float and double. Not just as an external, but the actual
specialization is in the header. Which prevents us from doing our own.
Moreover, comments refer to what seem to be sections in the C++ spec,
so this is probably standard.
So at this point I want to say: OK so the c++ standard expects the std
lib implementors to fully take care of optimizing complex for float
and double, fine! So the next thing to do is to check the assembly
with some test program: enable SSE2 and check that the compiler really
produces vectorized code. An addition of complex<double> should be
vectorized for example. Looking at the <complex> header, there doesn't
seem to be any explicit vectorization, but perhaps they know that the
compiler does a good job here, or perhaps i don't understand what
these __real__, __imag__ pseudo keywords mean.
Cheers,
Benoit
> yes of course we first need to get the vectorization of complex done the
> right way, and I still don't know what's the best option. We can either
> provide our own specialization of std::complex<double> implementation or use
> the ei_p* functions:
>
> Own specialization of std::complex<double>
> * pro:
> - enable its optimization everywhere
> * cons:
> - we and the user have to make sure a std::complex<double> is always
> correctly aligned (easy to do unless the user starts doing fancy casts)
> - on the long term, the next gen CPUs will have 256bit registers, so a
> packet of std::complex<double> won't be made of a single coeff anymore...
>
> Using the ei_p* with packet of 1 coeff
> * pro:
> - no need to specialize std::complex<double>
> * cons:
> - need to enable the vectorization path (to use the ei_p*) even for
> packets of one element, so we need another mechanism to specify whether a
> scalar type is vectorizable or not
> - usually, the vectorized path does not really make sens for packet size
> of 1 coeff, (on the other hand, the products are implemented on this model,
> so why not)
>
> all in all, currently I don't see any argument which would strongly kill one
> option in favor of the other, and so I would vote for the first one (own
> specialization of std::complex<double>) simply because it enables more
> optimizations and it is probably easier to do.
>
> gael.
>
>
>>
>> Cheers,
>> Benoit
>>
>> 2008/12/2 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>> >
>> > all of this looks great,
>> >
>> > On Mon, Dec 1, 2008 at 11:31 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
>> > wrote:
>> >>
>> >> 2008/12/1 Tim Molteno <tim@xxxxxxxxxxxxxxxxxxx>:
>> >>
>> >> for (unsigned i=0; i<N2; i++)
>> >> {
>> >> std::complex<T> temp(data[i+N2]*w);
>> >> data[i+N2] = data[i]-temp;
>> >> data[i] += temp;
>> >> w += w*wp;
>> >> }
>> >>
>> >> And I don't see any way to use Eigen expressions here, or to vectorize
>> >> that loop as such (again, no contiguous access).
>> >
>> > what about the following (let's assume the size of a packet is 2):
>> >
>> > const int PacketSize = 2;
>> > Scalar _w[PacketSize];
>> > _w[0] = 0;
>> > for (int k=1; k<PacketSize; ++k)
>> > _w[k] = _w[k-1] + _w[k-1] * wp;
>> > Packet w = ei_pload(_w);
>> > Packet wp2 = ei_pset1(wp);
>> > for (int i=0; i<N2; i+=PacketSize)
>> > {
>> > Packet tmp = ei_pmul(ei_pload(data+i+N2), w);
>> > Packet di = ei_pload(data+i);
>> > ei_store(data+i+N2, ei_psub(di, tmp));
>> > ei_store(data+i, ei_padd(di, tmp));
>> > w = ei_madd(w, wp2, w);
>> > }
>> >
>> > I think this should work whatever PacketSize is (including the case when
>> > the
>> > vectorization is disabled, i.e., PacketSize = 1)
>> >
>> > gael.
>> >
>>
>> ---
>>
>
>
---