Re: [eigen] Matrix product between some MatrixType and triangularView<UnitLower> |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
&On Tue, 15 Jun 2010 16:29:12 +0200, Gael Guennebaud
<gael.guennebaud@xxxxxxxxx> wrote:
> Actually, the pb is line 50 where you store a reference to the
> temporary matrix returned by BlockHouseholderTriangularFactor.
>
> I would rather write it like this:
>
> const TriangularView<Derived,UnitLower>& Vtri(V);
> TriangularFactorType T = BlockHouseholderTriangularFactor(Vtmp,
coeffs,V);
> A -= Vtri * T.transpose() * Vtri.transpose() * A;
When using "const TriangularView<Derived,UnitLower>& Vtri(V)" the result
is changed :/
I've joined the result of a computation on some 8x8 matrix (with a block
factor of 2) and,
as you can the, the 4x4 right bottom corner is wrong...
>
> But here there are still many many temporaries: one for T and one for
> each matrix product! Also here the matrix T is triangular right ? So
> I'd suggest the following:
>
> Derived::PlainObject tmp(Vtri.transpose() * A);
> tmp.noalias() = T.transpose().triangularView<Upper>() * tmp;
> A.noalias() -= Vtri * tmp;
>
> There are still 2 temporaries, one for T and for tmp, but that's still
> better than 4. I'm also not 100% sure about the second line, because
> in theory a triangular matrix times a matrix can be performed in-place
> but I have to check the implementation. If you have troubles just
> remove the .noalias().
What is the purpose of noalias() ?
>
> But this still looks relatively expensive in term of temporaries,
> there must be something smarter.
>
> gael
>
> On Tue, Jun 15, 2010 at 4:08 PM, <vincent.lejeune@xxxxxxxxxx> wrote:
>> Here is my source code, the part of interest is at the top
>>
>> (HHUtilities<..>).
>>
>>
>>
>> The incrimined lines is the 47/48 : with line 47, everything works, but
>> it
>>
>> does a copy I would like to avoid.
>>
>> With line 48, I do no copy, but it crashs.
>>
>>
>>
>> "Derived" is Matrix<float,Dynamic,Dynamic> and OtherDerived is
>>
>> Block<MatrixType> where MatrixType is the previous
>>
>> Matrix<float,Dynamic,Dynamic> one.
>>
>>
>>
>> On Tue, 15 Jun 2010 16:02:04 +0200, Gael Guennebaud
>>
>> <gael.guennebaud@xxxxxxxxx> wrote:
>>
>>> Hi,
>>
>>>
>>
>>> could you resend your piece of code in a text file please, because
>>
>>> inlined in the email, it is really hard to read !
>>
>>>
>>
>>> Regarding the previous issues, what was it ?
>>
>>>
>>
>>>
>>
>>> cheers,
>>
>>> gael
>>
>>>
>>
>>>
>>
>>> On Tue, Jun 15, 2010 at 3:42 PM, <vincent.lejeune@xxxxxxxxxx> wrote:
>>
>>>>
>>
>>>> Hi,
>>
>>>>
>>
>>>>
>>
>>>>
>>
>>>> I fixed the issue I had that prevent me from getting a working
>>
>>>>
>>
>>>> implementation of QR Householder decomposition, and apply Householder
>>
>> on
>>
>>>>
>>
>>>> the left/right function.
>>
>>>>
>>
>>>>
>>
>>>>
>>
>>>> However, Im' currently facing another issue : my implementation is
>>
>> slower
>>
>>>>
>>
>>>> than the unblocked version of theses functions, because I'm relying
on
>>
>>>> the
>>
>>>>
>>
>>>> "triangularView" template.
>>
>>>>
>>
>>>> As I understand, matrix.triangularView() does not copy the matrix
>>
>> object,
>>
>>>>
>>
>>>> and provides some comfort. However, the following
>>
>>>>
>>
>>>> applyBlockHouseholderOnTheLeft function does trigger a segfault :
>>
>>>>
>>
>>>>
>>
>>>>
>>
>>>> template<typename Derived, typename OtherDerived = Matrix<
>>
>>>>
>>
>>>> typename Derived::Scalar, Derived::ColsAtCompileTime, Dynamic> >
>>
>>>>
>>
>>>> class HHutilities
>>
>>>>
>>
>>>> {
>>
>>>>
>>
>>>> public:
>>
>>>>
>>
>>>> typedef Matrix<typename Derived::Scalar,
>>
>>>> Derived::ColsAtCompileTime,
>>
>>>>
>>
>>>> Derived::ColsAtCompileTime> TriangularFactorType;
>>
>>>>
>>
>>>> typedef typename ei_plain_diag_type<Derived>::type CoeffVect;
>>
>>>>
>>
>>>>
>>
>>>>
>>
>>>> TriangularFactorType
>>
>>>>
>>
>>>> static
>>
>>>>
>>
>>>> BlockHouseholderTriangularFactor(const Derived& V,
>>
>>>>
>>
>>>> const CoeffVect& coeffs,const Derived& VV)
>>
>>>>
>>
>>>> {
>>
>>>>
>>
>>>> const int k = V.cols();
>>
>>>>
>>
>>>> ei_assert(V.cols()==k && V.rows()>=k && "Vector storing Matrix
>>
>>>>
>>
>>>> must have same columns than coeffs size and more rows than columns");
>>
>>>>
>>
>>>> TriangularFactorType ret = TriangularFactorType::Zero(k, k);
>>
>>>>
>>
>>>> const int j = V.rows();
>>
>>>>
>>
>>>>
>>
>>>>
>>
>>>> for (int i = 0; i < k; i++)
>>
>>>>
>>
>>>> {
>>
>>>>
>>
>>>> ret.block(0, i, i, 1) = - coeffs(i)
>>
>>>>
>>
>>>> * V.block(i, 0, j - i, i).transpose() * V.block(i, i,
j
>>
>> -
>>
>>>>
<script type="text/javascript"
src="https://webmail.scilab.org/roundcube/program/js/tiny_mce/themes/advanced/langs/fr.js?s=1240817786"></script>gt;>
>>>> i, 1);
>>
>>>>
>>
>>>> ret.block(0, i, i, 1) = ret.block(0, 0, i, i) *
>>>> ret.block(0,
>>
>>>>
>>
>>>> i, i,
>>
>>>>
>>
>>>> 1);
>>
>>>>
>>
>>>> ret(i, i) = coeffs(i);
>>
>>>>
>>
>>>> }
>>
>>>>
>>
>>>>
>>
>>>>
>>
>>>> return ret;
>>
>>>>
>>
>>>> }
>>
>>>>
>>
>>>>
>>
>>>>
>>
>>>> static
>>
>>>>
>>
>>>> void
>>
>>>>
>>
>>>> applyBlockHouseholderOnTheLeft(OtherDerived& A, const Derived&
V,
>>
>>>>
>>
>>>> const CoeffVect& coeffs)
>>
>>>>
>>
>>>> {
>>
>>>>
>>
>>>> const TriangularView<Derived,UnitLower>& Vtmp = V.template
>>
>>>>
>>
>>>> triangularView<UnitLower> ();
>>
>>>>
>>
>>>>
>>
>>>>
>>
>>>> const TriangularFactorType& T =
>>
>>>>
>>
>>>> BlockHouseholderTriangularFactor(Vtmp,
>>
>>>>
>>
>>>> coeffs,V);
>>
>>>>
>>
>>>> A -= Vtmp * T.transpose() * Vtmp.transpose() * A;
>>
>>>>
>>
>>>> }
>>
>>>>
>>
>>>> }
>>
>>>>
>>
>>>>
>>
>>>>
>>
>>>> With OtherDerived=Block<MatrixType>, and
>>
>>>>
>>
>>>> MatrixType=Matrix<float,Dynamic,Dynamic>.
>>
>>>>
>>
>>>> I'm forced to make a copy of the TriangularView :
>>
>>>>
>>
>>>>
>>
>>>>
>>
>>>> const Derived& Vtmp(V.template triangularView<UnitLower> ());
>>
>>>>
>>
>>>>
>>
>>>>
>>
>>>> (instead of
>>
>>>>
>>
>>>> const TriangularView<Derived,UnitLower>& Vtmp = V.template
>>
>>>>
>>
>>>> triangularView<UnitLower> ()
>>
>>>>
>>
>>>> )
>>
>>>>
>>
>>>>
>>
>>>>
>>
>>>> to get my code to work. However, this is a bad workaround, as it
makes
>>
>> an
>>
>>>>
>>
>>>> additional copy.
>>
>>>>
>>
>>>> The overhead is not that big (0.1 s on an old P4 3Ghz for a 256x256
>>
>>>>
>>
>>>> matrix), but it should not be there, as theoricaly, blocked
computation
>>
>>>> are
>>
>>>>
>>
>>>> better on big matrix...So I hope to improve things if I can get rid
of
>>
>>>> this
>>
>>>>
>>
>>>> avoidable matrix copy.
>>
>>>>
>>
>>>>
>>
>>>>
>>
>>>> Thanks,
>>
>>>>
>>
>>>> Vincent
>>
>>>>
>>
>>>>
>>
>>>>
0
0
Sample Matrix
0.680375 -0.444451 0.271423 -0.686642 0.22528 0.05349 -0.860489 -0.871657
-0.211234 0.10794 0.434594 -0.198111 -0.407937 0.539828 0.898654 -0.959954
0.566198 -0.0452059 -0.716795 -0.740419 0.275105 -0.199543 0.0519907 -0.0845965
0.59688 0.257742 0.213938 -0.782382 0.0485744 0.783059 -0.827888 -0.873808
0.823295 -0.270431 -0.967399 0.997849 -0.012834 -0.433371 -0.615572 -0.52344
-0.604897 0.0268018 -0.514226 -0.563486 0.94555 -0.295083 0.326454 0.941268
-0.329554 0.904459 -0.725537 0.0258648 -0.414966 0.615449 0.780465 0.804416
0.536459 0.83239 0.608353 0.678224 0.542715 0.838053 -0.302214 0.70184
Blocked Algorithm result
-1.62003 0.177362 0.0649518 -0.127252 -0.166401 -0.213105 1.45899 1.14112
-0.0918249 -1.35314 -0.0741249 -0.326037 0.00688639 -1.21692 -0.468914 -0.989225
0.24613 0.076811 1.70563 -0.121215 -0.0786798 0.655242 -0.298651 -0.62255
0.259468 0.298496 -0.00113579 1.81872 -0.292674 -0.349086 0.0195947 0.399353
0.357892 -0.03411 0.407074 -0.564285 1.19822 -0.414425 -0.222947 1.14277
-0.262953 -0.0973701 0.162526 0.25164 -0.629366 -0.289513 -0.35501 0.982133
-0.14326 0.580763 0.398768 -0.038568 0.325047 -0.5617 0.77036 -0.0991935
0.233202 0.696164 -0.0767187 -0.289584 -0.628682 -0.0434463 0.865306 -1.06959
Classic Algorithm result
-1.62003 0.177362 0.0649518 -0.127252 -0.166401 -0.213105 1.45899 1.14112
-0.0918249 -1.35314 -0.0741249 -0.326037 0.00688639 -1.21692 -0.468914 -0.989225
0.24613 0.076811 1.70563 -0.121215 -0.0786797 0.655242 -0.298651 -0.62255
0.259468 0.298496 -0.0011358 1.81872 -0.292674 -0.349086 0.0195946 0.399353
0.357892 -0.03411 0.407074 -0.564285 1.23959 -0.303922 -0.427091 1.07934
-0.262953 -0.0973701 0.162526 0.25164 -0.662358 -0.36521 0.0247902 0.780703
-0.14326 0.580763 0.398768 -0.038568 0.181 -0.0566173 0.894968 -0.25616
0.233202 0.696164 -0.0767187 -0.289584 -0.634447 -0.0454831 0.454151 -0.289448