Re: [eigen] BDCSVD update |
[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]
Hi, some quick updates about D&C SVD.In the evaluator branch [1], I cleaned a bit the code and implemented the application of HouseholderSequence block-wise. Now, the performance for a 1000^2 square matrix is as follow:BDCSVD: 0.447019BDCSVD U+V: 1.27355dgesdd_: 0.48619 (Lapack)dgesdd_ U+V: 1.20698 (Lapack)When computing the SVD with both the U and V factors, the computation cost is roughly distributed as follows:(1) - 25% for the bidiagonalization(2) - 33% to update the m_naiveU and m_naiveV factors in the conquer step as matrix-matrix products [2].(3) - 35% to apply the U and V factors of the bidiagonalization as HouseholderSequence to get the final U and V factors.The rest represents only a small fraction. So, as you can see we reached a point where further optimizations are not straightforward as all these three parts are already implemented in a cache friendly manner (if we forgot about parallelism).Nonetheless, I observed that in part (2), the m_naiveU and m_naiveV matrices are rather sparse: before the last update, about 50% of the coefficients are still zeros. Moreover, these zeros seems to be well structured, as shown by the final m_naiveU matrix for a 20x20 random matrix:100100100101011101110
100100100101011101110
100100100101011101110
100100100101011101110
100100100101011101110
100100100101011101110
100100100101011101110
100100100101011101110
100100100101011101110
100100100101011101110
100100100101011101110
011011011010100010000
011011011010100010000
011011011010100010000
011011011010100010000
011011011010100010000
011011011010100010000
011011011010100010000
011011011010100010000
011011011010100010000
000000000000000000001(here '1' represents any non zero value)So in theory, we should be able to get a 2x speedup for part (2) which is quite significant.cheers,On Tue, Aug 27, 2013 at 5:49 PM, Gael Guennebaud <gael.guennebaud@xxxxxxxxx> wrote:Congrats, with your new solver I now get:(versus MKL)BDCSVD: 0.61278BDCSVD U+V: 5.05408dgesdd_: 0.486337dgesdd_ U+V: 1.20376(versus Lapack)BDCSVD: 1.94448BDCSVD U+V: 11.7599dgesdd_: 1.82727dgesdd_ U+V: 3.36355FYI, here, the bidiagonalization step takes about 1.7s.So as you guessed, there is major room for improvement to compute U and V.gael.On Tue, Aug 27, 2013 at 5:44 PM, Gael Guennebaud <gael.guennebaud@xxxxxxxxx> wrote:yes. The bidiagonalization step is now as fast as Lapack or intel MKL (with a single thread) even though it is still dominated by matrix-vector products.On Tue, Aug 27, 2013 at 4:54 PM, Jitse Niesen <jitse@xxxxxxxxxxxxxxxxx> wrote:
Gael, thanks for your work on bidiagonalization. Hopefully we are getting close to lapack now.Before your new solving method, I got these numbers (for a 1000 x 1000 random matrix):on a xeon X5570 versus Intel MKL:BDCSVD: 1.03396BDCSVD U+V: 5.39393dgesdd_(MKL): 0.53134dgesdd_(MKL) U+V: 1.25072on a core2 versus Lapack:BDCSVD: 2.35741BDCSVD U+V: 12.1212dgesdd_: 1.81886dgesdd_ U+V: 3.46479gael
Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |