Re: [eigen] Vectorization of complex |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] Vectorization of complex
- From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
- Date: Fri, 21 Jan 2011 22:17:04 +0100
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=eh2dHnvi+yjDA6cygock55IjgONTqhneBppPyTuNirQ=; b=MwMl1Ql5qSkDNOyYw2SXE0J50xnh21XbVp8oWIv6kLK7vkg/HLZjrST6MC97lJAlU1 PL8F8NAq8Im7zYH5YlbouKvf3VTRhC9y8IIqfC6jTHWNtClhG0lGDmvSBPGUfLM7d9EL fgMnq3hHx1W9MZqFQWnXmzfAciAAE9l1k+rl4=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=hkMxNTeNFUyqeyI6vvSMpU1UTncCc1t1H4JfaRExD0bsfAzfn4SQeosKfcQPIgG8gd 3HgK2jv59RmbrBbKzk+VPZYwXdypsQAubvp5UMyuV83x3dOV5hPe65PDNOEgZ6Cnk9YU 4QLUjsveQS0A3Qk4S3L/D3cp7qs6ZQZCUO/7s=
On Fri, Jan 21, 2011 at 1:28 PM, Christoph Hertzberg
<chtz@xxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> On 21.01.2011 13:02, Gael Guennebaud wrote:
>> note that our matrix-matrix product kernel for complexes does not use
>> this pmul function which is rather slow. The trick is to split the
>> products between the real and imaginary part and combine them at the
>> end of a series of mul-add.
>
> BTW: Have you tried the "3-multiplication-trick" for complex matrix
> multiplication yet:
>
> (A+iB)*(C+iD) = AC - BD + i[(A+B)*(C+D) - AC - BD]
>
> For big enough matrices this could give almost 25% performance gain --
> at cost of little precision loss (could actually be quite large, e.g. if
> the imaginary part is much smaller than the real part).
as you say, this trick is numerically not accurate to be used in practice.
>> Well this pmul function is actually used N^2 times for the
>> multiplication with alpha. Recall that our kernel computes C += alpha
>> * A * B, and even if you only do C = A*B this product with alpha is
>> still there, taking alpha = 1.
>
> Why? I admit this might be necessary in non-template libraries to
> reduce/avoid code duplication, but I would have assumed that this can be
> avoided by template specializations somehow (I have never checked your
> multiplication-kernel though ...)
The code is very heavy, and so we don't want to have to instantiate
the almost same code twice for a very small performance gain. Also
note that that code is for dynamic size matrices. For small fixed size
ones, the path is different without such an implicit multiplicative
factor.
gael