Re: [eigen] Malloc-free dynamic matrices

On Thu, Mar 4, 2010 at 10:23 PM, Gael Guennebaud <gael.guennebaud@xxxxxxxxx> wrote:

On Thu, Mar 4, 2010 at 9:49 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:

2010/3/4 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>
>
> On Thu, Mar 4, 2010 at 5:54 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
> wrote:
>>
>> 2010/3/4 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>> >
>> > yes there are hundreds of way to achieve these goals without touching
>> > Matrix, but none of the solutions will be as convenient as having it by
>> > default directly inside Matrix. For instance, nobody would ever think
>> > about
>> > removing reserve from std::vector! I really don't see any valuable
>> > argument
>> > against doing so. Let me be clear, such a change won't affect at all the
>> > performance for normal use cases, while it will significantly speedup
>> > and
>> > simplify (via push* functions) the assembly of large matrices and
>> > vectors.
>> > So let's rather focus about implementing it inside Eigen.
>>
>> OK.
>>
>> > Regarding the question I asked about offering a 2D or 1D reserve, I
>> > would
>> > finally go for the 1D variant. The main reasons are:
>> > - consistent with the behavior of MaxRows/MaxCols
>> > - no stride => all Matrix objects keep the linear access flag
>> > - easier to make it consistent with other resizing features
>> >
>> > Everybody agree with that?
>>
>> I didn't follow the 1D vs 2D debate but at least i don't object. And
>> definitely, I want all Matrix objects to keep the linear access flag.
>>
>> >
>> > Ok, now we need an additional m_allocatedSize integer in the class
>> > Matrix
>> > (only for dynamically sized matrices, but without any option to
>> > enable/disable it, I want it to be the default and unique choice). Note
>> > that
>> > such a change implies an ABI change, and so it is important do discuss
>> > it
>> > now.
>>
>> Yes.
>>
>> > Moremover in order to be able to freely add new features in the future
>> > without breaking the ABI, I would even add at least one other integer,
>> > just
>> > in case. For instance to have dynamic flags, or so. I'm not 100% sure
>> > about
>> > that, but why not?
>> >
>> > To add these extra attributes, I see two options:
>> >
>> > The standard one using ei_int_if_dynamic<>. con: increases
>> > sizeof(MatrixXf)
>> >
>> > The second one is to put them at the begining of the dynamically
>> > allocated
>> > buffer. So more precisely, the allocation/dealocation would become:
>> >
>> > alloc:
>> > m_data = ei_aligned_new(size+<16 bytes>) + <16 bytes>;
>> > allocatedSize() = size;
>> >
>> > dealloc:
>> > ei_aligned_delete(m_data-<16bytes>);
>> >
>> > int& allocatedSize() {return (m_data-<16bytes>);}
>> >
>> > Disclaimer: yes the above is not C++, it is just to picture the idea!
>>
>> This looks like going only halfway toward heap-stored data. Instead,
>> why not take the bolder move of adding a d-pointer? We would put there
>> any additional data that is OK to access with non-inline functions.. So
>> we would keep directly as data members the array pointer m_data and
>> the dimensions m_rows and m_cols so we can still call
>> rows()/cols()/data() at zero cost (useful as they are used all the
>> time) but other less frequently used data could be deferred onto the
>> d-pointer and accessed through no-inline accessors.
>
> To be honest I don't see how adding a d-pointer can offer more flexibility.

With your approach, any additional member data that we may want to add
in the future, has to fit in the fixed number of bytes that were
reserved, like 8 bytes or 16 bytes. We have to decide once and for all
how much space we reserve for additional members. Moreover, once we've
added a member, we have to keep its offset fixed forever. All of that
can theoretically be overcome by using a d-pointer.

hm... maybe I've been clear but as "my proposal" I was referring to the solution of storing the D_structure in the dynamically allocated memory, i.e., with the data.

> My proposal affords the same with less memory and runtime overhead: a true
> d-pointer would requires in addition one pointer, one call to malloc,

Yep, I thought about that just after sending the e-mail. The solution
might be to merge this idea with your idea: allocate at once the
matrix array and the D_structure. But in order to allow the
D_structure to grow in the future, place it after the array, not
before, and access it only with non-inline accessors .... now here's
the catch... that must be compiled into a shared library :( I didn't
think about that in my previous e-mail, but the d-pointer approach can
only work if we have a binary shared library :( Though at that point,
having such a tiny library would solve a bunch of problems at once
(cache size parameters, etc). I don't know what to think about that.

You cannot easily put it at the end because ideally we would store the allocatedSize variable in the D_structure, and if you put it at the end of the data, you need the allocatedSize to access to the D_structure... Since this whole approach can only work via a shared library, we can still put the D_structure at the beginning, in which case we also need to have a:
S_structure* Matrix::d()
function in the shared library too.

Or you can also put the D_structure at rows*cols, so not necessarily after the reserved data. Both approaches are even IMO.

> and
> the extra memory used by the system to manage the allocated memory.

That also goes away once we unify the 2 malloc calls into one, see
above (like in your proposal).

> The main
> concern is of course the additional malloc call per object. My proposal
> requires only sizeof(D_structure) which must be a multiple of 16bytes to
> preserve alignment.

So, with my above updated proposal, this concern too goes away.

> Actually, after thinking about it, I'm not sure we can really add new stuff
> in what I called above the "D_structure" without breaking the ABI, or
> without adding a binary library.

Hehe, exactly what I didn't think of in my previous email :(
See above.

> Indeed, let's assume we have a library L
> (e.g., an optimized solver based on Eigen and using Eigen's Matrix for its
> public interface) and an application A which uses L via dynamic linkage. In
> 3.1 we add a new variable in the D_structure of Matrix. Then the library L
> is recompiled against Eigen 3.1 to get advantage of this shiny new feature.
> Theoretically, the application A which was linked against Eigen 3.0 should
> still work. And to make that happens we must make sure that Matrix objects
> created by the application A are created with the Eigen 3.1 code to properly
> create and initialize the D_structure. And achieve this I don't see other
> solutions than having a "ei_create_and_initialize_D_structure()" function in
> a binary libeigen.so (or .dll).

Totally agree.

>> > This approach has many advantages:
>> > - sizeof(MatrixXf) remains minimal (good to pass/return MatrixXf by
>> > value,
>> > etc.)
>>
>> I don't understand this.
>>
>> If we pass/return a MatrixXf by value, the copy constructor gets call,
>> which deep-copies the whole array anyway. So how good is it to have a
>> smaller size on the stack? Do you have copy-on-write in mind?
>
> I just wanted to emphasize that with this approach sizeof(MatrixXf) does not
> increase, in case some people care. The example was bad indeed.

ok.

>> > - we have 12bytes free for future uses
>>
>> That we can have either way: on the heap (your approach) or on the stack.
>>
>> Here's another comment. Why do you count in bytes? We only need to
>> preserve the ABI _within_ each given architecture, so we can define
>> the ABI in terms of sizeof(int) and sizeof(void*). So instead of
>> reserving, say, 16 bytes, i'd rather be talking in terms of
>> "2*sizeof(int)+sizeof(void*)"
>
> It's just that the offset must be a multiple of 16bytes to preserve
> alignment. In practice, of course it is better to define a D_structure, and
> take its size with some padding.

First of all, regardless of whether we do a D pointer or not, I'd
suggest putting the D structure after the matrix array, not before. At
least that allows to add arbitrarily much member data in the future,
instead of hardcoding a hard limit today.

Then the alignment requirement on D is softened a lot, though it still
wouldn't hurt to guarantee some alignment. Especially if we allow
matrices with sizeof(Scalar) very small, the pointer at the end of the
array could have very poor alignment. So why not 16 bytes, although
unless we put SIMD stuff in the D structure 8 bytes should be enough.

>> > And for the cons:
>> > - well I cannot see any, maybe it is a bit hackish but who cares?
>>
>> Indeed there are not too many inconvenients. I'm just checking with
>> you if there are advantages? :)
>
> Indeed. It seems there might advantages only if we add a binary library.
> One more argument to add a binary.

So you'd be in favor of adding a tiny binary library?

yep, that's the big question.

I'm completely hesitating, I can't make a decision on that. I guess
that if we treat this issue simultaneously with other issues that
would benefit from a binary lib, such as cache size runtime
parameters, then the case for a binary lib gets quite strong. On the
other hand it will require good communication and documentation, it
would be great to keep it optional (maybe make its code optionally
available as a header file...), and it should be WTFPL-licensed.

Same here, though I become more and more in favor to a shared library as it might solve many issues.

Regarding runtime settings without a shared lib, I was thinking about using a static variable inside a function:

// internal
int manage_cache_size(enum action,int v=0)
{
static int value = EIGEN_DEFAULT_CACHE_SIZE;
if(action="" value = v;
if(action="" return value;
return value;

}

// public:
int cacheSize() { return manage_cache_size(get); }
void setCacheSize(int v) { manage_cache_size(set,v); }

but I'm really unsure about that...

Looks like I shot my own feet... ;)
With gcc, if I declare manage_cache_size inline (and not static of course), that works fine. I tried with multiple object files calling manage_cache_size and also with a shared lib and one application. Is is clear enough?

gael.

gael.

Benoit