[eigen] Matrix multiplication seems to be exceptionally slow in one specific case

I am trying to implement a whitening transform using Eigen3, but I find that one of the matrix operations is extremely slow and I can't understand why.. Below is the code:

    MatrixXd M = convertToEigen(data);
    MatrixXd mu = M.rowwise() - M.colwise().mean();
    MatrixXd S = (mu.adjoint()*mu)/double(M.rows());
    EigenSolver<MatrixXd> es(S, true);
    auto V(es.eigenvalues());
    auto U(es.eigenvectors());
    V = V.cwiseSqrt().asDiagonal();
    MatrixXd L(S.rows(), S.cols());
    for(size_t d = 0; d < L.rows(); d++)
        L(d, d) = 1./real(L(d, d));
    MatrixXd Y(M.rows(), M.cols());
    auto T = L*U.real().transpose();
    for(size_t n = 0; n < M.rows(); n++)
        Y.row(n) = T*(M.row(n) - mu);

    Y = Y.transpose();

I clocked some of the operations and find that the T*() inside the for-loop is the bottle neck, which takes about 0.047688 seconds on a 4th generation intel i7 and about the same time as the eigenvalue decomposition, which takes 0.049989 seconds, so that can't be right. M is a 12498 x 225 matrix, so T is 225 x 225 and so is mu.
I am running Kubuntu 14.04, Eigen 3.2.0-8, which is in the repositories and all optimisation flags are turned on. Does anybody know what the problem is?

Best regards,

