Re: [eigen] Getting Householder reflections right

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


On Sun, May 10, 2009 at 5:34 PM, Michael Meyer <mjhmeyer@xxxxxxxxx> wrote:
>
> Greetings,
>
>
> I looked at the code for the Hessenberg tridiagonalization of a symmetric matrix A
> (Tidiagonalization::_compute).
> It seems that the symmetry of A is not used to the fullest.
>
> With v denoting the Householder vector, Q=hvv' and H=I-Q
> we have to compute
>
> A = (I-Q')A(I-Q)
>
> The code does
>
> A1=(I-Q')A then A=A1(I-Q).
>
> The intermediate matrix A1 is not symmetric.
> We need (h' denotes the complex conjugate of h)
>
> A1Q=h(A1v)v' and Q'A=(AQ)'=h'(Av)v',
>
> that is, we have two multiplications of the form (Av)v'.
> If we do both multiplications in one step
>
> A = (I-Q')A(I-Q) = A-(Q'A+AQ)+Q'AQ
>
> then we can use that
>
> (a) A is symmetric, so compute only the upper half
> (b) the matrix Q'A+AQ = (AQ)'+AQ is symmetric
> (C) we compute and save the vector w=Av, then we get AQ=h(Av)v'=hwv' and
>
> (d) Q'AQ = |h|^2(v'Av)vv' = |h|^2(v'w)vv', where vv' is again symmetric
>
> so we have 1.5 multiplications of the form (Av)v' (vv' is half of one because of the symmetry).
> The symmetry in (b) does not do us much good since we still need all of AQ.
>
> I am not sure it's worth going after a 25% speedup (if any) if the code gets much uglier.
> The constructor (which calls _compute and so already computes the decomposition)
> is already very fast (1 sec for a 1000 x 1000 matrix).

hm... this is already what is done:

1) compute w = Av    (line 223)
2) compute u = hw - 0.5 h^2 (v'w) v   (line 227)
3) compute A -= uv' + vu' (line 232)

and only the lower triangular part of A is computed.

> Returning the orthogonal matrix Q is much slower (39 secs) and I do not understand why since the computation is quite
> similar to the one in the decomposition.

39 secs to extract Q ! there is clearly something wrong here ! anyway,
Triadiag is going to be slightly (?), significantly (?) updated by
Benoit, and we'll try not to forget to also bench the extraction !!

> A poster on an earlier thread once commented that LAPACK is faster for the symmetric eigenvalue problem.
> This may be because the algorithms are written so that they make as much use a possible of matrix-matrix multiplication
> instead of matrix-vector multiplication.

yes, the next step would be to implement a block version on top of this one.

> A large effort then goes into speeding up matrix-matrix multiplication and the code is probably very ugly.
> It is not realistic for Eigen to try and duplicate that.

np, Eigen's matrix-matrix multiplication is already very fast, not as
fast as MKL or GOTO blas, but very close.

Gael.

>
> Cheers,
>
> Mike
>
>
>
> --- On Fri, 5/8/09, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
>
>> From: Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
>> Subject: Re: [eigen] Getting Householder reflections right
>> To: eigen@xxxxxxxxxxxxxxxxxxx
>> Date: Friday, May 8, 2009, 3:49 PM
>> LOL.
>> First I googled for "block householder", found this paper.
>> Then Rohit IMs me and points me to the same paper.
>> Then you. Small world!
>>
>> Anyway, I was about to draw the same conclusion: The
>> construction of
>> the block householder reflector relies on
>> ....(drumroll).... SVD !
>>
>> So yes, let's implement a plain SVD first.
>>
>> Cheers,
>> Benoit
>>
>> 2009/5/8 Keir Mierle <mierle@xxxxxxxxx>:
>> > I found this:
>> >
>> > http://www.jstage.jst.go.jp/article/ipsjdc/2/0/298/_pdf
>> >
>> > Looks complicated. I suggest implementing the
>> householder
>> > bidiagonalization in a straightforward way first,
>> where code clarity
>> > and modularity goes before speed. Then we can look at
>> blocking
>> > versions.
>> >
>> > Keir
>> >
>> > On Fri, May 8, 2009 at 12:33 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
>> wrote:
>> >> thanks for the reminder, it's actually the right
>> time to look at that
>> >> as it may impact the design.
>> >> Benoit
>> >>
>> >> 2009/5/8 Rohit Garg <rpg.314@xxxxxxxxx>:
>> >>> Possibly OT
>> >>>
>> >>> May be you should look at block wise House
>> holder transformations. I'd
>> >>> infact recommend that all algorithms
>> is-as-far-as-possible be done in
>> >>> a blockwise manner.
>> >>>
>> >>> On Sat, May 9, 2009 at 12:52 AM, Benoit Jacob
>> <jacob.benoit.1@xxxxxxxxx>
>> wrote:
>> >>>> Hi,
>> >>>>
>> >>>> I'm wondering how to do Householder
>> reflections right.
>> >>>> Books (Golub&vanLoan) and libraries
>> (LAPACK) seem to _always_ use the
>> >>>> following packed storage for a Householder
>> vector v:
>> >>>> they normalize it so that v(0)=1, hence
>> they don't need to store v(0)
>> >>>> which is useful for storing decompositions
>> in a packed way in a single
>> >>>> matrix.
>> >>>> However, they store separately the norm of
>> v, in order to avoid having
>> >>>> to recompute it everytime.
>> >>>>
>> >>>> I'm not saying this is always a bad idea,
>> i'm saying that's not always
>> >>>> a good idea.
>> >>>>
>> >>>> That's a good idea for the QR
>> decomposition, because it allows to
>> >>>> store the whole decomposition in a single
>> matrix, plus a vector to
>> >>>> store the norms of the v's.
>> >>>>
>> >>>> But right now i'm looking at
>> bidiagonalization and i'm seriously
>> >>>> wondering if this packed storage is a good
>> idea.
>> >>>> With the packed storage, i can store in a
>> single matrix the bidiagonal
>> >>>> matrix, and the essential parts of the
>> householder vectors, but then i
>> >>>> need to store in 2 separate vectors the
>> norms of the householder
>> >>>> vectors.
>> >>>> While _without_ packed storage, i'd be
>> able to store entirely in a
>> >>>> single matrix (of the same size) the whole
>> householder vectors, and
>> >>>> then i'd store the bidiagonal matrix
>> separately (obviously in a
>> >>>> compact way).
>> >>>> So here, I don't see any advantage in
>> using packed Householder storage.
>> >>>> On the other hand, NOT packing the
>> householder vectors will give
>> >>>> simpler code and slightly better speed.
>> >>>>
>> >>>> Maybe the only advantage is to make our
>> bidiagonalization storage
>> >>>> compatible with LAPACK. But if that's
>> important, we can add conversion
>> >>>> methods, and, having n^2 complexity, the
>> conversion won't have a big
>> >>>> overhead compared to the n^3 cost of the
>> decomposition itself.
>> >>>>
>> >>>> Same remarks for these close relatives:
>> tridiagonalization and
>> >>>> hessenberg decomposition (which i'll
>> rework along the way).
>> >>>>
>> >>>> Opinions?
>> >>>>
>> >>>> Cheers,
>> >>>> Benoit
>> >>>>
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Rohit Garg
>> >>>
>> >>> http://rpg-314.blogspot.com/
>> >>>
>> >>> Senior Undergraduate
>> >>> Department of Physics
>> >>> Indian Institute of Technology
>> >>> Bombay
>> >>>
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >
>> >
>> >
>>
>>
>>
>
>
>
>
>
>



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/