Re: [eigen] Back from google

[ Thread Index | Date Index | More Archives ]

On Wed, Nov 4, 2009 at 12:19 AM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
2009/11/4 Keir Mierle <mierle@xxxxxxxxx>:
>> * Just use sqrt() ? The questions are: are we OK to always pay for a
>> sqrt here? This seems like unneeded extra performance degradation for
>> small dyn-size matrices (of course it's negligible for large
>> matrices). In the default case of a compile-time constant, GCC >= 4.3
>> is able to compute the sqrt at compile-time, but i wonder about MSVC
>> and ICC.
> If the variable is encapsulated in a binary library, then the square root
> can also be stored in a variable so that it is only recomputed if the size
> changes; for example:
> void Eigen::SetCacheBlockSize(int new_size) {
>   ei_cache_block_size = new_size;
>   ei_sqrt_cache_block_size = sqrt(new_size)
> }
>> * Introduce a separate preprocessor #define to let the user specify a
>> runtime cache size variable name?
>> Or going farther into the direction of binary libraries:
>> * Introduce a preprocessor symbol to define some global variables,
>> e.g. one for the cpu cache size? We'd need to tell the user that if he
>> wants to use the corresponding feature, then one of his source file
>> must define that macro before #including Eigen.
>> There is of course always the option of creating an optional tiny
>> binary lib, but i still don't see a compelling reason to do that...
> I don't think a small binary library is crazy, provided it is optional and
> disabled by default.

"optional and disabled by default" works well indeed to address needs
of a company like Google where someone will take care of enabling that
feature. But it doesn't work so well with linux distros who will want
to stick to the default configuration, or will somehow pressure us to
make that the default as soon as some package requires eigen to be
configured with this option.

So i'm not ruling out a binary library, but i wanted to mention the downsides.

There are two issues with packaging Eigen for linux distributions. The first is packaging the headers for developers to use in development of other applications. The second is packaging applications such as Blender or VTK that depend on Eigen.

In the first case, there are no issues because there is only headers. In my proposed solution (below), this does not change because there would not be a separate binary library.

In the second case, at the moment, there is also no issue because Eigen has no binary component; apps which depend on Eigen have Eigen's code intermingled with their application code via templates. However, this isn't optimal because only one binary is shipped to all users. The predefined cache size won't provide the best performance for everyone.

Unfortunately, creating a binary package for Eigen is clearly undesirable from multiple perspectives. It removes control about how Eigen is compiled from downstream developers; it increases packaging burden; it decreases flexibility; etc.

I suggest instead is that we provide a "EigenInternals.cpp" or similar file, which must be #included by the application somewhere. EigenInternals.cpp will include Eigen specific global variables, and functions to manipulate them.

That way, for the first case, nothing changes because the EigenInternal.cpp file is distributed alongside the include files. No binary is required. In the second case, since each binary application will have linked its own private copy of the EigenInternal.cpp, they can each be compiled with different options.

This is also a win for anyone releasing binaries for Windows, where casual recompilation is rare.


> This would also open the door to us determining cache
> sizes at startup, so the user doesn't have to.

Correct me if i'm wrong, as i'm really not an expert, but i thought
that knowing the CPU cache size only was enough information if we can
count on having all the CPU cache for ourselves in 1 single thread,
right? In general, it seemed to me that only the user could tell how
much cpu cache we could count on using. So, how is it useful to
determine the cpu cache size automatically?

I don't know the details, but I believe the cache is yours until the next context switch.
> It's extra maintenance, but
> going forward it may be required if we wish Eigen to get used in prebuilt
> software. For example, consider packages depending on Eigen compiled for
> debian, or libmv when eventually bundled with blender.

I do understand that the current solution of having the cache size
known at build time, is not satisfactory :)

> There was another item brought up by a Googler, who couldn't make lunch,
> which was that there aren't any benchmarks on the wiki comparing sparse
> performance. It would be nice if the benchmark suite included comparisons
> against, e.g. gmm++ and ublas. I realize most of the solvers are implemented
> by other backends, but things like sparse matrix multiplication is still
> done natively.

---> that would be nice indeed! And an accessible "junior job" for
someone wanting to contribute! The next question is if that can be
done in BTL, that i dont know.

> Thanks for joining us for lunch!

Thanks a ton for the invitation!
Everyone: Google offices are up to their reputation, the place is a
sort of hacker paradise with palm trees, free food, and technical chat
all over the place, from what i could see.


> Keir
>> Benoit

Mail converted by MHonArc 2.6.19+