Re: [eigen] Expression templates

On Fri, Jul 30, 2010 at 5:26 PM, Carlos Becker <carlosbecker@xxxxxxxxx> wrote:

Thanks Benoit for the quick answer, actually the problem is a bit more complicated. I was hiding a bit of the real code to make it simpler. This is an example of what I have inside my code:
mFFT = mMicDelayed[0][ mGrid.mDelay[iX][iY][iZ][0] ] +
                                  mMicDelayed[1][ mGrid.mDelay[iX][iY][iZ][1] ] +
                                  mMicDelayed[2][ mGrid.mDelay[iX][iY][iZ][2] ] +
                                  mMicDelayed[3][ mGrid.mDelay[iX][iY][iZ][3] ] +
                                  mMicDelayed[4][ mGrid.mDelay[iX][iY][iZ][4] ] +
                                  mMicDelayed[5][ mGrid.mDelay[iX][iY][iZ][5] ] +
                                  mMicDelayed[6][ mGrid.mDelay[iX][iY][iZ][6] ] +
                                  mMicDelayed[7][ mGrid.mDelay[iX][iY][iZ][7] ];
It is a bit strange but it allows me to optimize and re-utilize many operations. In this case, the multiplications were already done before since I am re-using them (multiplications in the frequency domain by exp(-j*w) allow me to do time-shifts). Anyways I believe I could live with my code being something like:

if ( N == 8 )
  // expand manually for N = 8, such as the code I posted above
else if (N == 16 )
..

and trust GCC to optimize it since N is a template parameter. I was looking for a more elegant way to do this, but I guess that it would take more time than what I was thinking to spend on it.

Thanks
Carlos

On Fri, Jul 30, 2010 at 4:04 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
2010/7/30 Carlos Becker <carlosbecker@xxxxxxxxx>:

> Hi all again,
> I am doing some audio processing and I am working with a set of N arrays
> (ArrayXf in this case). At a certain point, I am doing something like this:
> ArrayXf eachArray[N]; // this is pre-loaded before
> ArrayXf arrayMult[N]; // this is also pre-loaded
> ArrayXf result;
> result = eachArray[0] * arrayMult[0] + eachArray[1] + arrayMult[1] + ....
> + eachArray[N-1] * arrayMult[N-1];
>
> This operation is performed on a templated class that takes N as a template
> parameter. I made some tests and, as expected, doing something like
> result.setZero();
> for (int i=0; i < N; i++)
>   result += eachArray[i] * arrayMult[i];
> is not the best option, and neither is
> result = eachArray[0] * arrayMult[0];;
> for (int i=1; i < N; i++)
>   result += eachArray[i] * arrayMult[i];
>
> This is probably because of memory accessing speed, so I was looking for a
> way to 'unroll' this loop properly. Is there any template/class in Eigen
> that would allow me to do this easily?

Why don't you store your array-of-arrays as 2-dimensional Eigen arrays
to start with. Then you can do a partial reduction like rowwise sum.

typedef Array<float,Dynamic,N> ArrayOfNArrays;
ArrayOfNArrays eachArray;
ArrayOfNArrays arrayMult;
ArrayXf result = (eachArray * arrayMult).rowwise().sum();

This should be perfectly unrolled and vectorized by Eigen; if it's
not, please file a bug.

Benoit

> Thanks!
> Carlos