Re: [eigen] Malloc-free dynamic matrices

On Thu, Mar 4, 2010 at 5:54 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:

2010/3/4 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:

>
> yes there are hundreds of way to achieve these goals without touching
> Matrix, but none of the solutions will be as convenient as having it by
> default directly inside Matrix. For instance, nobody would ever think about
> removing reserve from std::vector! I really don't see any valuable argument
> against doing so. Let me be clear, such a change won't affect at all the
> performance for normal use cases, while it will significantly speedup and
> simplify (via push* functions) the assembly of large matrices and vectors.
> So let's rather focus about implementing it inside Eigen.

OK.

> Regarding the question I asked about offering a 2D or 1D reserve, I would
> finally go for the 1D variant. The main reasons are:
> - consistent with the behavior of MaxRows/MaxCols
> - no stride => all Matrix objects keep the linear access flag
> - easier to make it consistent with other resizing features
>
> Everybody agree with that?

I didn't follow the 1D vs 2D debate but at least i don't object. And
definitely, I want all Matrix objects to keep the linear access flag.

>
> Ok, now we need an additional m_allocatedSize integer in the class Matrix
> (only for dynamically sized matrices, but without any option to
> enable/disable it, I want it to be the default and unique choice). Note that
> such a change implies an ABI change, and so it is important do discuss it
> now.

Yes.

> Moremover in order to be able to freely add new features in the future
> without breaking the ABI, I would even add at least one other integer, just
> in case. For instance to have dynamic flags, or so. I'm not 100% sure about
> that, but why not?
>
> To add these extra attributes, I see two options:
>
> The standard one using ei_int_if_dynamic<>. con: increases sizeof(MatrixXf)
>
> The second one is to put them at the begining of the dynamically allocated
> buffer. So more precisely, the allocation/dealocation would become:
>
> alloc:
> m_data = ei_aligned_new(size+<16 bytes>) + <16 bytes>;
> allocatedSize() = size;
>
> dealloc:
> ei_aligned_delete(m_data-<16bytes>);
>
> int& allocatedSize() {return (m_data-<16bytes>);}
>
> Disclaimer: yes the above is not C++, it is just to picture the idea!

This looks like going only halfway toward heap-stored data. Instead,
why not take the bolder move of adding a d-pointer? We would put there
any additional data that is OK to access with non-inline functions. So
we would keep directly as data members the array pointer m_data and
the dimensions m_rows and m_cols so we can still call
rows()/cols()/data() at zero cost (useful as they are used all the
time) but other less frequently used data could be deferred onto the
d-pointer and accessed through no-inline accessors.

To be honest I don't see how adding a d-pointer can offer more flexibility. My proposal affords the same with less memory and runtime overhead: a true d-pointer would requires in addition one pointer, one call to malloc, and the extra memory used by the system to manage the allocated memory. The main concern is of course the additional malloc call per object. My proposal requires only sizeof(D_structure) which must be a multiple of 16bytes to preserve alignment.

Actually, after thinking about it, I'm not sure we can really add new stuff in what I called above the "D_structure" without breaking the ABI, or without adding a binary library. Indeed, let's assume we have a library L (e.g., an optimized solver based on Eigen and using Eigen's Matrix for its public interface) and an application A which uses L via dynamic linkage. In 3.1 we add a new variable in the D_structure of Matrix. Then the library L is recompiled against Eigen 3.1 to get advantage of this shiny new feature. Theoretically, the application A which was linked against Eigen 3.0 should still work. And to make that happens we must make sure that Matrix objects created by the application A are created with the Eigen 3.1 code to properly create and initialize the D_structure. And achieve this I don't see other solutions than having a "ei_create_and_initialize_D_structure()" function in a binary libeigen.so (or .dll).

>
> This approach has many advantages:
> - sizeof(MatrixXf) remains minimal (good to pass/return MatrixXf by value,
> etc.)

I don't understand this.

If we pass/return a MatrixXf by value, the copy constructor gets call,
which deep-copies the whole array anyway. So how good is it to have a
smaller size on the stack? Do you have copy-on-write in mind?

I just wanted to emphasize that with this approach sizeof(MatrixXf) does not increase, in case some people care. The example was bad indeed.

> - we have 12bytes free for future uses

That we can have either way: on the heap (your approach) or on the stack.

Here's another comment. Why do you count in bytes? We only need to
preserve the ABI _within_ each given architecture, so we can define
the ABI in terms of sizeof(int) and sizeof(void*). So instead of
reserving, say, 16 bytes, i'd rather be talking in terms of
"2*sizeof(int)+sizeof(void*)"

It's just that the offset must be a multiple of 16bytes to preserve alignment. In practice, of course it is better to define a D_structure, and take its size with some padding.

>
> And for the cons:
> - well I cannot see any, maybe it is a bit hackish but who cares?

Indeed there are not too many inconvenients. I'm just checking with
you if there are advantages? :)

Indeed. It seems there might advantages only if we add a binary library. One more argument to add a binary.

As I suggest above, how about a real d-pointer approach? This would be
true flexibility, we could arbitrarily much more member data in the
future. Again, I'm not suggesting to put all in the d-pointer, the
most important members m_data,m_rows,m_cols can stay directly as
members of Matrix.

It is pretty clear that m_data,m_rows,m_cols must stay directly as members of Matrix. Then as I said my approach allows to do the same than with an extra d-pointer without the malloc overhead.

Now, do you think that your proposed allocatedSize() is OK to have
non-inlined? If yes, then we can put m_allocatedSize in the d-pointer.
This makes things really simple: the only data member that we're
adding to MatrixXf is the d-pointer.

> And finally, remains the question whether resizing above the pre allocated
> size should be allowed or not. Here it is pretty clear that the default must
> allows that. So this could be achieved using a Matrix option, but that's a
> bit overkill to introduce a new just for that. So it could also be a runtime
> flag. See, I've already found a practical use case for the extra data!
>
> What do you think?

What do you think ? :)

Benoit

>
> Gael.
>
> On Thu, Mar 4, 2010 at 3:58 AM, leon zadorin <leonleon77@xxxxxxxxx> wrote:
>>
>> On 3/4/10, Márton Danóczy <marton78@xxxxxxxxx> wrote:
>> > On 3 March 2010 15:44, leon zadorin <leonleon77@xxxxxxxxx> wrote:
>> >> double * reserved(aligned_alloc(8192));
>> >>
>> >> // 3x3
>> >> Map<double> m(reserved, 3, 3);
>> >>
>> >> // 10x10 -- resize w/o reallocation -- equivalent to NeverRealloc w/o
>> >> much repatching(?)
>> >> m.remap(reserved, 10, 10);
>> >
>> [...]
>> > Matrix<float, Dynamic, Dynamic, NoRealloc> A, B, C;
>> >
>> > A.reserve(16); A.resize(4,4);
>> > B.reserve(25); B.resize(5,5);
>> > C.reserve(81); C.resize(9,9);
>> >
>> > B=A; //this should work and resize B to 4x4
>> > B=C; //this should raise an assertion failure
>> >
>> > To achieve this behaviour, we do need a dedicated class -- or an
>> > option, such as in the patch.
>>
>> Sure... but not quite -- if we are talking about adding extra
>> "sanity-checking" in the debug builds of an app, and if current API
>> has no such assertions, then some form of changes would be needed.
>>
>> But in such a case, still -- I don't think Benoit was arguing against
>> any changes -- only in favor of smaller/more-elegant ones...
>>
>> Map<> obj. ctor could indeed have an additional arg denoting the max
>> size of the (re)mappable mem (on which it is mapping)... or indeed, as
>> others have alluded to, one could pass an 'array object' instead of
>> just 'float/double *' (given that what you are really trying to
>> 'sanity-check' is the boundary -- and such is not meant to
>> auto-reallocate but to assert -- implying that this should not be
>> happening 'by design') -- so a logical conclusion would be for Map<>
>> to be able to map from a vector/array type (e.g. ::boost::array
>> semantics), then:
>>
>> array a1(25);
>> array a2(81);
>>
>> Map m1(a1, 5, 5);
>>
>> Map m2(a2, 9, 9);
>>
>> m1 = m2; // assertion due to boundary-checking of 'intelligent' arrays...
>>
>> or if constructing/mapping explicitly from "double/float *"
>>
>> double *a1(alloc(25));
>> Map m1(a1, 5, 5, 25); // i.e. additional arg denoting max mappable mem.
>>
>> double *a2(alloc(81));
>> Map m2(a2, 9, 9, 81);
>>
>> m1 = m2; // the same assertion... (ifndef NDEBUG of course)
>>
>> In fact, given that you are only requiring an assertion (not exception
>> throwing), then the actual presence of the above fields (e.g. in the
>> Map class, the 'bounds-keeping' variable could, itself, exist only
>> #ifndef NDEBUG
>> size_t max_mappable_size;
>> #endif
>>
>> Alternatively, as you mentioned -- another class may be used... but,
>> if such is to be the case, then what I was trying to say was that this
>> class feels like being more of a facade design-pattern, not a patch to
>> the existing class:
>>
>> struct blah {
>> double * data;
>> Map m;
>> blah()
>> : data(allocate), m(data, x, y)
>> {
>> }
>> etc. etc. -- like op= with checking for memory-bounds violations....
>> };
>>
>> ... but only if one wants it...
>>
>> from your code-example it looks to me like it really should be subject
>> to an "array" object... or a Map object... as memory-bounds checking
>> is really a question of abstracting the c-style 'arrays' in the first
>> place... ideally...
>>
>> kind regards
>> Leon.
>>
>>
>
>