Re: [eigen] matrix::linspace |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: Eigen <eigen@xxxxxxxxxxxxxxxxxxx>
- Subject: Re: [eigen] matrix::linspace
- From: Hauke Heibel <hauke.heibel@xxxxxxxxxxxxxx>
- Date: Sat, 23 Jan 2010 13:39:30 +0100
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=WX6xCg6juwYgMnO0HvI7190C3wsUwaRiwj6srQ4wSHA=; b=iX1hodplWUxOXcemflnKa2wGjkJtswXLYIu7XgejunY7WnqG68sZRDWnVMtf0zY5GW U5Cv2l7wJGRWiPlv6khFZKBETwbOeLfZYBS+Iu1eIkhTQGCIguHJ90d6iAz6InB5Hmko rObrr1LpBv+tjK6UlXlEtfysN92ek0TIEiOoo=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=kb7bKxAvfqD8tnU4cd+73uuT36LNZDTSG9olnk2tB6al8dM2PCo5yYzXD84AfpWlLV Cg/CE2vohGIiMaFL5RmLkhmPnh0Tsrr74Q6md+qPdjTYk9IHNqTFmAgkea5OL/Wt7ffR wX+ghMpT3acQ8EWEao07yopiYfylxNzLm/dnE=
Oh, just for the sake of completeness, this is how I am testing. Maybe
something is wrong over here:
{
Timer t; double a,e1,e2;
double low = 5.0;
double high = 7.0;
const int size = 200000;
double step = (high-low)/(size-1);
{
ArrayXd test;
a = t.getTimeStamp();
for (int i=0; i<5000; ++i)
test = ArrayXd::NullaryExpr(size,
ei_linspace_op<double>(low,high,size));
}
e1 = t.getTimeStamp()-a;
a = t.getTimeStamp();
{
ArrayXd test(size);
for (int i=0; i<5000; ++i)
for (int i=0; i<size; ++i)
test(i) = low+i;
}
e2 = t.getTimeStamp()-a;
std::cout << "vec: " << e1*1000.0 << " ms" << std::endl;
std::cout << "lin: " << e2*1000.0 << " ms" << std::endl;
}
- Hauke
On Sat, Jan 23, 2010 at 1:37 PM, Hauke Heibel
<hauke.heibel@xxxxxxxxxxxxxx> wrote:
> Hi there,
>
> I am working on some functionality which would allow something like this:
>
> VectorXd a = VectorXd::linspace(low,high,num_steps)
>
> which should be corresponding to:
>
> const double step_size = (high-low)/(num_steps-1);
> VectorXd a(num_steps);
> for (int i=0; i<num_steps; ++i)
> a(i) = low+i*step_size;
>
> In Eigen, this is best implemented based on NullaryExpressions, i.e.
> Matrices, where each element can be computed based on some functor. In
> theory, this should not only be nice from a usability point of view
> but I also hoped of gaining from vectorization. It tuns out, that this
> is not that easy - it maybe due to MSVC but maybe also due to my lack
> of SSE skills. For the most simple example
>
> for (int i=0; i<num_steps; ++i) a(i) = i;
>
> I managed to implement the functor's method ei_linspace_op::packetOp
> such that we do actually gain from vectorization as (note that I added
> integer parameters to it):
>
> // for doubles only
> ei_linspace_op::packetOp(int i, int = 0)
> {
> return ei_padd(ei_pset1<Scalar>(i),ei_pset<Scalar>(0,1));
> }
>
> For me the first surprise was that this performs way better than just
> returning ei_pset<Scalar>(i,i+1).
>
> Next, I looked into the case:
>
> for (int i=0; i<num_steps; ++i) a(i) = low+i
>
> and here, I totally failed. Returning
>
> return ei_pset<Scalar>(m_low+i,m_low+i+1);
>
> resulted in no statistically meaningful gain (~0.06%) speedup. The
> same holds for
>
> return ei_padd(ei_pset1<Scalar>(i),ei_padd(ei_pset1(m_low),ei_pset<Scalar>(0,1)));
>
> which is even sometimes slower and permuting the operators did not help either.
>
> So, I am sort of puzzled about what I could improve. Any ideas? Any
> SSE operation, I am not aware of for creating (a, a+1, a+2, ...)
>
> Finally, note that ei_pset is not yet in PacketMath. Over here, I
> added forward delarations into GenericPacketMath and the actual SSE
> implementation in the corresponding PacketMath header (no Altivec
> yet). They GenericPacketMath declarations (as the name says) have no
> implementations since I don't see any meaningful way of providing them
> nor do I think they are useful. The second change in the lib was
> mentioned above - I added indices to the packetOp functions.
>
> If everything regarding the vectorization fails, we could still
> provide the functionality (with a specialization for the trivial case
> where we gain from vectorizing) because at least, I think it is nifty.
>
> - Hauke
>