[eigen] RcppEigen, providing bindings from R to Eigen through Rcpp, has been released |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: [eigen] RcppEigen, providing bindings from R to Eigen through Rcpp, has been released
- From: Douglas Bates <bates@xxxxxxxxxxxxx>
- Date: Tue, 28 Jun 2011 12:33:29 -0500
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:content-type; bh=bk+3vdlfeKiNaVSAl6NYfCPikLLF3LPTT2pnXxnVEag=; b=Gm6QRVYYwoSTf3tHiO3VOrUGXT8DquD0O0tGQIrGVLl+X7BalSoicIcjSt7WeknJJn vaZwuowyTl8vlqFu5bGzA8KaxYAwJHS2/EkVHDxtU8MGcXJjpYI92uqQAcBHrve8ZesM 9YT3Hhwp2OoWqTGuQ5gJXwlk7GQFAV4fXyAr4=
An initial release of the R package RcppEigen, which provides bindings
to Eigen from R using the Rcpp template package for R objects in C++,
is now available on CRAN, the Comprehensive R Archive Network. See
http://cran.r-project.org/web/packages/RcppEigen or the equivalent
location on one of the mirror sites such as
http://cran.us.r-project.org/
R is a language and system for statistical analysis and graphics. See
http://www.R-project.org or http://en.wikipedia.org/wiki/R_language
for more details. It is a interpreted, functional programming
language based on self-describing objects, and provides for extension
through user-contributed R packages. The Rcpp package provides
templated C++ classes representing the underlying R object types. The
RcppEigen package provides linkage to Eigen classes through the Rcpp
classes. Several examples of using Eigen classes to fit a linear
model and provide the standard errors of the coefficient estimates are
given in the source package file RcppEigen/src/fastLm.cpp. The source
package is provided as a .tar.gz file on the CRAN site mentioned
above. Comments on and enhancements to this example code are welcome.
I'm still a relative neophyte in Eigen.
There is a simple benchmark of these methods that can be run in the
following way.
1. Install R
2. Install the RcppEigen package by starting R and entering
install.packages("RcppEigen")
This will also install the Rcpp package.
3. In the current R session or in any other R session enter
library(RcppEigen) # attaches the RcppEigen package
source(system.file("examples", "lmBenchmark.R", package="RcppEigen"), echo=TRUE)
This runs 20 replications of fitting a linear model with model
matrix of size 100000 by 40. Other problem sizes and numbers of
replications can be specified with the optional arguments n, p and
nrep to the function do_bench(). If the RcppArmadillo and/or the
RcppGSL packages are available methods based on them are also used.
On my desktop (Athlon II X4 2.9 GHz, 8 GB PC1333 memory,
single-threaded atlas compiled on this machine) the results are
> do_bench()
lm benchmark for n = 100000 and p = 40: nrep = 20
test relative elapsed user.self sys.self
7 LLt 1.000000 1.132 1.14 0.00
3 LDLt 1.019435 1.154 1.16 0.00
5 SymmEig 2.932862 3.320 2.56 0.75
2 PivQR 7.891343 8.933 8.26 0.67
8 arma 7.897527 8.940 8.64 0.02
6 QR 8.679329 9.825 9.07 0.74
1 lm.fit 13.082155 14.809 13.38 1.40
4 SVD 64.440813 72.947 71.86 1.01
9 GSL 156.208481 176.828 175.86 0.80
The lm.fit timing is from the native R implementation based on a
modified QR decomposition from Linpack (hence using only level-1
BLAS). The reason for using Linpack is because we want a specific
column pivoting scheme, which is to retain the original column order
except in the case or apparent singularity when the current pivot
column is moved to the right hand side via a cyclic shift.
Strictly speaking the lm.fit results should only be compared to
rank-revealing decompositions of which, at present, only the PivQR
method based on the colPivHouseholderQr method is a rank-revealing
implementation. However, the SymmEig and SVD methods can and will be
adapted. I think that the LDLt method can also be adapted even though
the pivoting scheme doesn't guarantee decreasing magnitudes of the
elements of the diagonal vector D.
The RcppGSL implementation uses the gsl_multifit_linear function that
is based on a singular value decomposition. The RcppArmadillo
implementation uses the arma::solve function for overdetermined
systems. It is based on the Lapack subroutine dgels.
On another machine using the reference BLAS I get
lm benchmark for n = 100000 and p = 40: nrep = 20
test relative elapsed user.self sys.self
3 LDLt 1.000000 1.593 1.591 0.000
7 LLt 1.001255 1.595 1.592 0.001
5 SymmEig 2.787822 4.441 3.674 0.758
6 QR 6.760829 10.770 10.010 0.752
2 PivQR 6.914626 11.015 10.225 0.777
1 lm.fit 9.943503 15.840 14.288 1.537
8 arma 13.971751 22.257 22.202 0.010
4 SVD 41.958569 66.840 65.609 1.158
9 GSL 162.653484 259.107 257.944 0.932
Bumping up the size to 1 000 000 by 100 I get, on the machine with atlas,
> do_bench(1000000, 100, 10)
lm benchmark for n = 1e+06 and p = 100: nrep = 10
test relative elapsed user.self sys.self
3 LDLt 1.000000 25.605 25.49 0.10
7 LLt 1.047491 26.821 26.70 0.10
5 SymmEig 1.882054 48.190 66.61 10.05
6 QR 7.443937 190.602 181.92 8.56
8 arma 8.933138 228.733 224.29 4.22
2 PivQR 8.981644 229.975 221.70 8.11
1 lm.fit 12.003163 307.341 272.16 34.91
4 SVD 78.411873 2007.736 2012.31 21.72
9 GSL 210.777465 5396.957 5382.61 10.36
and on the machine with the reference BLAS
> do_bench(1000000L, 100L, 20L, TRUE)
lm benchmark for n = 1000000 and p = 100: nrep = 20
test relative elapsed user.self sys.self
3 LDLt 1.000000 69.297 69.096 0.174
6 LLt 1.039295 72.020 71.750 0.203
4 SymmEig 2.828102 195.979 176.954 18.961
5 QR 8.990534 623.017 603.931 18.530
2 PivQR 12.731807 882.276 863.076 18.474
1 lm.fit 17.585884 1218.649 1133.541 84.174
7 arma 30.593186 2120.016 2108.481 10.087