Re: [eigen] Vectorization of complex |

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

*To*: eigen@xxxxxxxxxxxxxxxxxxx*Subject*: Re: [eigen] Vectorization of complex*From*: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>*Date*: Fri, 21 Jan 2011 13:02:41 +0100*Dkim-signature*: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type:content-transfer-encoding; bh=49QkmV3JexkLMHPzC/wXfZY6hRzY+sLRMZbEuJxwOTY=; b=CaolnDXPWhkwb/+Lomlx3LvnyCCNMpU2K4fmqz1mnXPgeOgQbExTooQPl7x4QHB4XX a+Cs9XgnwDcpSeuQBMMWkFHZ+kW2zNf6+jeLCfz5xrAWYpp4Bqw4ON9upGTs0P2cydjk bCfo9ighmjZa8DdwQrXI20Cu3O3GdJUJkiXCc=*Domainkey-signature*: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=F/9IpNuOJ+MiMuYaPwj7me9PO/MYyMoS7tSqQ6A5RvB9ndAIJGZS689ZwY2wGB4MtC 9/iG7qa4Z+0LOx8vgOFCwvFHPCDg8s/2JDuc6BizZXZpm2nZrE2458yKZrPM7pFm8dRK +3i6XWBjsvmKqayO6md7pz+ew95Ixs9QQ0B8g=

note that our matrix-matrix product kernel for complexes does not use this pmul function which is rather slow. The trick is to split the products between the real and imaginary part and combine them at the end of a series of mul-add. Well this pmul function is actually used N^2 times for the multiplication with alpha. Recall that our kernel computes C += alpha * A * B, and even if you only do C = A*B this product with alpha is still there, taking alpha = 1. gael On Fri, Jan 21, 2011 at 12:18 PM, Christoph Hertzberg <chtz@xxxxxxxxxxxxxxxxxxxxxxxx> wrote: > On 20.01.2011 23:11, David Luitz wrote: >> I then started testing the code and realized that unfortunately my >> implementation is a bit slower than the SSE2 version. Even more >> puzzling: Actually, the already existing SSE3 implementation is ALSO >> SLOWER than the SSE2 code! Does anybody have an idea, why my SSE4_1 code >> is even slower than the SSE3 code? > > Just an uneducated guess: > Especially for older processors it could be that it only emulates SSE3 > and SSE4_* instructions and is therefore slower (I had a similar thing > with an old AMD64 and SSE2 once). Though in more complex programs it > could be faster due to smaller code-size. > >> By the way, we are talking about something like 1 percent run time >> difference in my tests, but still if the SSE3 and SSE4 codes are not >> really faster than SSE2, I think they should be removed... > > At least this should be tested for different CPUs first ... > Maybe also make general suggestions such as: "Don't enable SSE3 for ..." > in the vectorization documentation. > > Regards > Christoph > > -- > ---------------------------------------------- > Dipl.-Inf. Christoph Hertzberg > Cartesium 0.051 > Universität Bremen > Enrique-Schmidt-Straße 5 > 28359 Bremen > > Tel: (+49) 421-218-64252 > ---------------------------------------------- > > >

**Follow-Ups**:**Re: [eigen] Vectorization of complex***From:*Christoph Hertzberg

**References**:**[eigen] Eigen2 to Eigen3 Migration Path***From:*Tully Foote

**Re: [eigen] Eigen2 to Eigen3 Migration Path***From:*Tully Foote

**Re: [eigen] Eigen2 to Eigen3 Migration Path***From:*Hauke Heibel

**Re: [eigen] Eigen2 to Eigen3 Migration Path***From:*Benoit Jacob

**Re: [eigen] Eigen2 to Eigen3 Migration Path***From:*Hauke Heibel

**Re: [eigen] Eigen2 to Eigen3 Migration Path***From:*Gael Guennebaud

**Re: [eigen] Eigen2 to Eigen3 Migration Path***From:*Hauke Heibel

**Re: [eigen] Eigen2 to Eigen3 Migration Path***From:*Benoit Jacob

**Re: [eigen] Eigen2 to Eigen3 Migration Path***From:*Tully Foote

**Re: [eigen] Eigen2 to Eigen3 Migration Path***From:*Benoit Jacob

**Re: [eigen] Eigen2 to Eigen3 Migration Path***From:*Benoit Jacob

**Re: [eigen] Eigen2 to Eigen3 Migration Path***From:*Benoit Jacob

**Re: [eigen] Eigen2 to Eigen3 Migration Path***From:*Benoit Jacob

**RE: [eigen] Eigen2 to Eigen3 Migration Path***From:*Yohann Solaro

**[eigen] Vectorization of complex***From:*David Luitz

**Re: [eigen] Vectorization of complex***From:*Christoph Hertzberg

**Messages sorted by:**[ date | thread ]- Prev by Date:
**Re: [eigen] Vectorization of complex** - Next by Date:
**Re: [eigen] Vectorization of complex** - Previous by thread:
**Re: [eigen] Vectorization of complex** - Next by thread:
**Re: [eigen] Vectorization of complex**

Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |