Re: [eigen] Getting Householder reflections right |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] Getting Householder reflections right
- From: Michael Meyer <mjhmeyer@xxxxxxxxx>
- Date: Mon, 11 May 2009 17:00:55 -0700 (PDT)
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1242086456; bh=wMCBTSyTAfelOXQsTkfDh+60bR9GszBnBpq0OCKMYoU=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=QmmI0FdwLwlmEcUf2zU6iEb9seBPVOWI3bERvNOmfM7YxcVQRv4QkgONuNcmo9AGTHFA42AoB3uS67Pe9B9qTKYADCgVsGMBKIo4arz0XMhaK61HsLs7/jRT0jhPGjD7x0CcaFC9MlupVywoQOmcyKQe2OO8pl72gjsHKorjk2s=
- Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=aRVP/BGVuuMuxHeM5bYyeAFDvV4Rey1NBsocxxsGOzcR+I54+4FxHXb/CYMhlXW5FJCF3Qj7SK8wkIRtwg4iL7XXEndLHd+lx+r99unYegLa94fAexxFM82qZfEFhiKDhZqV3/eOgR2LQX20URLkKRrfTNwisGRD49A3qbWuFME=;
Greetings,
I reran the Tridiagonalization code fora 1000 x 1000 Hilbert matrix H
with version 2.01.
Decomposition (constructor): 0 secs
Extracting the matrix Q: 3 secs.
So this is all very good.
The accuracy was
L1-norm(H) = 1385
L1-norm(H-QTQ') = 0.0001956
i.e. this can get much more accurate if ei_IsMuchSmallerThan is tightened up in this context.
I am truly in awe of this expression template design!!!
Cheers,
Mike
--- On Mon, 5/11/09, Gael Guennebaud <gael.guennebaud@xxxxxxxxx> wrote:
> From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
> Subject: Re: [eigen] Getting Householder reflections right
> To: eigen@xxxxxxxxxxxxxxxxxxx
> Date: Monday, May 11, 2009, 5:07 AM
> On Sun, May 10, 2009 at 5:34 PM,
> Michael Meyer <mjhmeyer@xxxxxxxxx>
> wrote:
> >
> > Greetings,
> >
> >
> > I looked at the code for the Hessenberg
> tridiagonalization of a symmetric matrix A
> > (Tidiagonalization::_compute).
> > It seems that the symmetry of A is not used to the
> fullest.
> >
> > With v denoting the Householder vector, Q=hvv' and
> H=I-Q
> > we have to compute
> >
> > A = (I-Q')A(I-Q)
> >
> > The code does
> >
> > A1=(I-Q')A then A=A1(I-Q).
> >
> > The intermediate matrix A1 is not symmetric.
> > We need (h' denotes the complex conjugate of h)
> >
> > A1Q=h(A1v)v' and Q'A=(AQ)'=h'(Av)v',
> >
> > that is, we have two multiplications of the form
> (Av)v'.
> > If we do both multiplications in one step
> >
> > A = (I-Q')A(I-Q) = A-(Q'A+AQ)+Q'AQ
> >
> > then we can use that
> >
> > (a) A is symmetric, so compute only the upper half
> > (b) the matrix Q'A+AQ = (AQ)'+AQ is symmetric
> > (C) we compute and save the vector w=Av, then we get
> AQ=h(Av)v'=hwv' and
> >
> > (d) Q'AQ = |h|^2(v'Av)vv' = |h|^2(v'w)vv', where vv'
> is again symmetric
> >
> > so we have 1.5 multiplications of the form (Av)v' (vv'
> is half of one because of the symmetry).
> > The symmetry in (b) does not do us much good since we
> still need all of AQ.
> >
> > I am not sure it's worth going after a 25% speedup (if
> any) if the code gets much uglier.
> > The constructor (which calls _compute and so already
> computes the decomposition)
> > is already very fast (1 sec for a 1000 x 1000
> matrix).
>
> hm... this is already what is done:
>
> 1) compute w = Av (line 223)
> 2) compute u = hw - 0.5 h^2 (v'w) v (line
> 227)
> 3) compute A -= uv' + vu' (line 232)
>
> and only the lower triangular part of A is computed.
>
> > Returning the orthogonal matrix Q is much slower (39
> secs) and I do not understand why since the computation is
> quite
> > similar to the one in the decomposition.
>
> 39 secs to extract Q ! there is clearly something wrong
> here ! anyway,
> Triadiag is going to be slightly (?), significantly (?)
> updated by
> Benoit, and we'll try not to forget to also bench the
> extraction !!
>
> > A poster on an earlier thread once commented that
> LAPACK is faster for the symmetric eigenvalue problem.
> > This may be because the algorithms are written so that
> they make as much use a possible of matrix-matrix
> multiplication
> > instead of matrix-vector multiplication.
>
> yes, the next step would be to implement a block version on
> top of this one.
>
> > A large effort then goes into speeding up
> matrix-matrix multiplication and the code is probably very
> ugly.
> > It is not realistic for Eigen to try and duplicate
> that.
>
> np, Eigen's matrix-matrix multiplication is already very
> fast, not as
> fast as MKL or GOTO blas, but very close.
>
> Gael.
>
> >
> > Cheers,
> >
> > Mike
> >
> >
> >
> > --- On Fri, 5/8/09, Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
> wrote:
> >
> >> From: Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
> >> Subject: Re: [eigen] Getting Householder
> reflections right
> >> To: eigen@xxxxxxxxxxxxxxxxxxx
> >> Date: Friday, May 8, 2009, 3:49 PM
> >> LOL.
> >> First I googled for "block householder", found
> this paper.
> >> Then Rohit IMs me and points me to the same
> paper.
> >> Then you. Small world!
> >>
> >> Anyway, I was about to draw the same conclusion:
> The
> >> construction of
> >> the block householder reflector relies on
> >> ....(drumroll).... SVD !
> >>
> >> So yes, let's implement a plain SVD first..
> >>
> >> Cheers,
> >> Benoit
> >>
> >> 2009/5/8 Keir Mierle <mierle@xxxxxxxxx>:
> >> > I found this:
> >> >
> >> > http://www.jstage.jst.go.jp/article/ipsjdc/2/0/298/_pdf
> >> >
> >> > Looks complicated.. I suggest implementing
> the
> >> householder
> >> > bidiagonalization in a straightforward way
> first,
> >> where code clarity
> >> > and modularity goes before speed. Then we can
> look at
> >> blocking
> >> > versions.
> >> >
> >> > Keir
> >> >
> >> > On Fri, May 8, 2009 at 12:33 PM, Benoit Jacob
> <jacob.benoit.1@xxxxxxxxx>
> >> wrote:
> >> >> thanks for the reminder, it's actually
> the right
> >> time to look at that
> >> >> as it may impact the design.
> >> >> Benoit
> >> >>
> >> >> 2009/5/8 Rohit Garg <rpg.314@xxxxxxxxx>:
> >> >>> Possibly OT
> >> >>>
> >> >>> May be you should look at block wise
> House
> >> holder transformations. I'd
> >> >>> infact recommend that all algorithms
> >> is-as-far-as-possible be done in
> >> >>> a blockwise manner.
> >> >>>
> >> >>> On Sat, May 9, 2009 at 12:52 AM,
> Benoit Jacob
> >> <jacob.benoit.1@xxxxxxxxx>
> >> wrote:
> >> >>>> Hi,
> >> >>>>
> >> >>>> I'm wondering how to do
> Householder
> >> reflections right.
> >> >>>> Books (Golub&vanLoan) and
> libraries
> >> (LAPACK) seem to _always_ use the
> >> >>>> following packed storage for a
> Householder
> >> vector v:
> >> >>>> they normalize it so that v(0)=1,
> hence
> >> they don't need to store v(0)
> >> >>>> which is useful for storing
> decompositions
> >> in a packed way in a single
> >> >>>> matrix.
> >> >>>> However, they store separately
> the norm of
> >> v, in order to avoid having
> >> >>>> to recompute it everytime.
> >> >>>>
> >> >>>> I'm not saying this is always a
> bad idea,
> >> i'm saying that's not always
> >> >>>> a good idea.
> >> >>>>
> >> >>>> That's a good idea for the QR
> >> decomposition, because it allows to
> >> >>>> store the whole decomposition in
> a single
> >> matrix, plus a vector to
> >> >>>> store the norms of the v's.
> >> >>>>
> >> >>>> But right now i'm looking at
> >> bidiagonalization and i'm seriously
> >> >>>> wondering if this packed storage
> is a good
> >> idea.
> >> >>>> With the packed storage, i can
> store in a
> >> single matrix the bidiagonal
> >> >>>> matrix, and the essential parts
> of the
> >> householder vectors, but then i
> >> >>>> need to store in 2 separate
> vectors the
> >> norms of the householder
> >> >>>> vectors.
> >> >>>> While _without_ packed storage,
> i'd be
> >> able to store entirely in a
> >> >>>> single matrix (of the same size)
> the whole
> >> householder vectors, and
> >> >>>> then i'd store the bidiagonal
> matrix
> >> separately (obviously in a
> >> >>>> compact way).
> >> >>>> So here, I don't see any
> advantage in
> >> using packed Householder storage.
> >> >>>> On the other hand, NOT packing
> the
> >> householder vectors will give
> >> >>>> simpler code and slightly better
> speed.
> >> >>>>
> >> >>>> Maybe the only advantage is to
> make our
> >> bidiagonalization storage
> >> >>>> compatible with LAPACK. But if
> that's
> >> important, we can add conversion
> >> >>>> methods, and, having n^2
> complexity, the
> >> conversion won't have a big
> >> >>>> overhead compared to the n^3 cost
> of the
> >> decomposition itself.
> >> >>>>
> >> >>>> Same remarks for these close
> relatives:
> >> tridiagonalization and
> >> >>>> hessenberg decomposition (which
> i'll
> >> rework along the way).
> >> >>>>
> >> >>>> Opinions?
> >> >>>>
> >> >>>> Cheers,
> >> >>>> Benoit
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Rohit Garg
> >> >>>
> >> >>> http://rpg-314.blogspot.com/
> >> >>>
> >> >>> Senior Undergraduate
> >> >>> Department of Physics
> >> >>> Indian Institute of Technology
> >> >>> Bombay
> >> >>>
> >> >>>
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >
> >> >
> >> >
> >>
> >>
> >>
> >
> >
> >
> >
> >
> >
>
>
>