Re: [eigen] BDCSVD update |

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

*To*: eigen <eigen@xxxxxxxxxxxxxxxxxxx>*Subject*: Re: [eigen] BDCSVD update*From*: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>*Date*: Fri, 5 Sep 2014 18:03:10 +0200*Dkim-signature*: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=l3WzJ/i1L8GdJRWQjYmT4FVYhI9mTSTujKVLlEAMRT4=; b=EZJ+IY6BaAhXdsdWO//ekc0wWwDPgJ76/PbzmU/PuJn50Y7tBhx6qZbilOBC4lAde0 V8U8CP5uD6cPl9q1ZcjkJhIBHdUcIQc1b67gs9cU93cJd67FJ9G6QKrEIwVdm5jpgn3W AwwWYhB8tQxcY8qxmOfbQ0mStq2vTRGfNWJR79IEybCFGRmLqQ0gb5LfQOvGLQGPk+18 UwHpI5187iHTTSU/KY81daI0OwR/giDEWBRmxZGMOAEG49jKHMHIYp9DV0Uatz20oHHw jHbu42FlhuxaQcYGuBplcpNdWEY4QaKLGQotFSE3Uh8s4HbSYN62qcKmgsyGE3lS+Bvt GDKA==

I implemented a naive approach to exploit the sparsity: https://bitbucket.org/ggael/eigen-evaluators/commits/3679e76e378f29ba6111d9c6f734f001c565dfc3, and the performance is now:

BDCSVD U+V: 1.11337

so the part (2) is indeed almost twice faster now.

Of course there must be a more elegant way to accomplish this!

Moreover, except for the last conquer step which produces a dense matrix, the other ones could also exploit the sparsity of the resulting matrix to divide by two the computation cost of them. The overall gain would be much smaller though because we are talking about much smaller problems than the last update (e.g., n^3 versus (n/2)^3).

gael

On Fri, Sep 5, 2014 at 3:29 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxxx> wrote:

Great work on getting a blocked bidiagonalization, and very intersting comment on the structure of the bidiagonalization unitaries!Benoit2014-09-05 9:16 GMT-04:00 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:Hi, some quick updates about D&C SVD.In the evaluator branch [1], I cleaned a bit the code and implemented the application of HouseholderSequence block-wise. Now, the performance for a 1000^2 square matrix is as follow:BDCSVD: 0.447019BDCSVD U+V: 1.27355dgesdd_: 0.48619 (Lapack)dgesdd_ U+V: 1.20698 (Lapack)When computing the SVD with both the U and V factors, the computation cost is roughly distributed as follows:(1) - 25% for the bidiagonalization(2) - 33% to update the m_naiveU and m_naiveV factors in the conquer step as matrix-matrix products [2].(3) - 35% to apply the U and V factors of the bidiagonalization as HouseholderSequence to get the final U and V factors..The rest represents only a small fraction. So, as you can see we reached a point where further optimizations are not straightforward as all these three parts are already implemented in a cache friendly manner (if we forgot about parallelism).Nonetheless, I observed that in part (2), the m_naiveU and m_naiveV matrices are rather sparse: before the last update, about 50% of the coefficients are still zeros. Moreover, these zeros seems to be well structured, as shown by the final m_naiveU matrix for a 20x20 random matrix:100100100101011101110

100100100101011101110

100100100101011101110

100100100101011101110

100100100101011101110

100100100101011101110

100100100101011101110

100100100101011101110

100100100101011101110

100100100101011101110

100100100101011101110

011011011010100010000

011011011010100010000

011011011010100010000

011011011010100010000

011011011010100010000

011011011010100010000

011011011010100010000

011011011010100010000

011011011010100010000

000000000000000000001(here '1' represents any non zero value)So in theory, we should be able to get a 2x speedup for part (2) which is quite significant.cheers,On Tue, Aug 27, 2013 at 5:49 PM, Gael Guennebaud <gael.guennebaud@xxxxxxxxx> wrote:Congrats, with your new solver I now get:(versus MKL)BDCSVD: 0.61278BDCSVD U+V: 5.05408dgesdd_: 0.486337dgesdd_ U+V: 1.20376(versus Lapack)BDCSVD: 1.94448BDCSVD U+V: 11.7599dgesdd_: 1.82727dgesdd_ U+V: 3.36355FYI, here, the bidiagonalization step takes about 1.7s.So as you guessed, there is major room for improvement to compute U and V.gael.On Tue, Aug 27, 2013 at 5:44 PM, Gael Guennebaud <gael.guennebaud@xxxxxxxxx> wrote:yes. The bidiagonalization step is now as fast as Lapack or intel MKL (with a single thread) even though it is still dominated by matrix-vector products.On Tue, Aug 27, 2013 at 4:54 PM, Jitse Niesen <jitse@xxxxxxxxxxxxxxxxx> wrote:

Gael, thanks for your work on bidiagonalization. Hopefully we are getting close to lapack now.Before your new solving method, I got these numbers (for a 1000 x 1000 random matrix):on a xeon X5570 versus Intel MKL:BDCSVD: 1.03396BDCSVD U+V: 5.39393dgesdd_(MKL): 0.53134dgesdd_(MKL) U+V: 1.25072on a core2 versus Lapack:BDCSVD: 2.35741BDCSVD U+V: 12.1212dgesdd_: 1.81886dgesdd_ U+V: 3.46479gael

**References**:**Re: [eigen] BDCSVD update***From:*Gael Guennebaud

**Re: [eigen] BDCSVD update***From:*Benoit Jacob

**Messages sorted by:**[ date | thread ]- Prev by Date:
**Re: [eigen] BDCSVD update** - Next by Date:
**Re: [eigen] Formulas in online docs not working** - Previous by thread:
**Re: [eigen] BDCSVD update** - Next by thread:
**[eigen] Custom Scalar in QR**

Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |