Re: [eigen] Issues regarding Quaternion-alignment and const Maps |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] Issues regarding Quaternion-alignment and const Maps
- From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
- Date: Sat, 10 Jul 2010 11:38:33 +0200
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=HYifnCTRgq1PGRUhUgwbzO4CUFITUA0vMRh3wxdheiI=; b=HGJGh0BYVGwC0I8TK6IyWM0vtz9XjbgbUwgGHc7P5CuAUpffgO/ill/KObUnTgNP8l thAXVBUfkh4SIGVZpovrTPKI5c41g/qQji+NV1e65K4VFEQnBF9RcpXWvHQRTv450K25 x4eZKER0D976xA7cozmwePjbuxm+NQh7m8Dvk=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=mT+QanGuLTvszgy8ap/WBg/8wNGh6rRMX91lkc8lj1FaT8snwFzOxO9573Gx3PxyZ7 /k2xfgXvci3WIics0SHIgik779m3RQe/G0VwC3n+w6IaQAe9reL3gxiuHa3kyBxJNsho UySDvVwvLUdCw2kH8q0QQiRcP3AHrUuSTtEJ0=
here a variant using at most as possible generic code:
const __m128d mask1 = _mm_castsi128_pd(_mm_set_epi32(0x0,0x0,0x80000000,0x0));
const __m128d mask2 = _mm_castsi128_pd(_mm_set_epi32(0x80000000,0x0,0x0,0x0));
Quaternion<double> res;
typedef ei_packet_traits<double>::type Packet;
const double* a = _a.coeffs().data();
Packet b_xy = _b.coeffs().template packet<Aligned>(0);
Packet b_zw = _b.coeffs().template packet<Aligned>(2);
Packet a_xx = ei_pset1(a[0]);
Packet a_yy = ei_pset1(a[1]);
Packet a_zz = ei_pset1(a[2]);
Packet a_ww = ei_pset1(a[3]);
Packet t1, t2;
t1 = ei_padd(ei_pmul(a_ww, b_xy), ei_pmul(a_yy, b_zw));
t2 = ei_psub(ei_pmul(a_zz, b_xy), ei_pmul(a_xx, b_zw));
#ifdef __SSE3__
ei_pstore(&res.x(), _mm_addsub_pd(t1, ei_preverse(t2)));
#else
ei_pstore(&res.x(), ei_padd(t1, ei_por(mask1,ei_preverse(t2))));
#endif
t1 = ei_psub(ei_pmul(a_ww, b_zw), ei_pmul(a_yy, b_xy));
t2 = ei_padd(ei_pmul(a_zz, b_zw), ei_pmul(a_xx, b_xy));
#ifdef __SSE3__
ei_pstore(&res.z(), ei_preverse(_mm_addsub_pd(ei_preverse(t1), t2)));
#else
ei_pstore(&res.z(), ei_padd(t1, ei_por(mask2,ei_preverse(t2))));
#endif
return res;
Actually, my recent work on the vectorization of complexes, and this
code, let me thought that it would be a good idea to add ei_paddsub
and ei_psubadd functions such that we could write generic vectorized
code for complex and quaternions (generic in the sense it would work
for all vector engine).
Here is how I see it. For instance let's take the example of the
quaternion multiplication. We could have a generic
template<typename Quat> Quat ei_quatmul(Quat& a, Quat& b);
function calling a ei_quatmul_selector which would be specialized for
the 3 following configurations:
1 - ei_packet_traits<Quat::Scalar>::size == 2 => the above code
2 - ei_packet_traits<Quat::Scalar>::size == 4 => the code we already
have but written in a generic way
3 - otherwise => scalar path
And we should make sure that one can specialize this function for a
given scalar type/vector engine in the case some specific
optimizations can be done.
gael.
On Sat, Jul 10, 2010 at 1:11 AM, Christoph Hertzberg
<chtz@xxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> Benoit Jacob wrote:
>>
>> I have made a patch letting ei_pset1 use _mm_loaddup_pd when we have SSE3:
>>
>> template<> EIGEN_STRONG_INLINE Packet2d ei_pset1<double>(const double&
>> from) {
>> #ifdef EIGEN_VECTORIZE_SSE3
>> return _mm_loaddup_pd(&from);
>> #else
>> Packet2d res = _mm_set_sd(from);
>> return ei_vec2d_swizzle1(res, 0, 0);
>> #endif
>> }
>>
>> But guess what? It's actually not faster (perhaps even a bit slower)
>> than our ei_vec2d_swizzle1!
>>
>> So let's just forget about it.
>>
>> Christoph, is _mm_loaddup_pd the only SSE3 intrinsic your code is
>> using ? If yes, by using ei_pset1 instead of _mm_loaddup_pd, you can
>> make your code work on SSE2 !
>
> I guess the most important SSE3 instruction is _mm_addsub_pd which adds the
> first and subtracts the second element. If there is a code which negates
> just one element, this could be replaced.
>
> Googleing a bit implies that the SSE-way to do it is to XOR with
> {-0.0, 0.0} (or the other way around). I will try that ...
>
> Christoph
>
> --
> ----------------------------------------------
> Dipl.-Inf. Christoph Hertzberg
> Cartesium 0.051
> Universität Bremen
> Enrique-Schmidt-Straße 5
> 28359 Bremen
>
> Tel: (+49) 421-218-64252
> ----------------------------------------------
>
>
>