Re: [eigen] Malloc-free dynamic matrices

[ Thread Index | Date Index | More Archives ]

2010/3/4 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
> 2010/3/4 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>> On Thu, Mar 4, 2010 at 5:54 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
>> wrote:
>>> 2010/3/4 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>>> >
>>> > yes there are hundreds of way to achieve these goals without touching
>>> > Matrix, but none of the solutions will be as convenient as having it by
>>> > default directly inside Matrix. For instance, nobody would ever think
>>> > about
>>> > removing reserve from std::vector! I really don't see any valuable
>>> > argument
>>> > against doing so. Let me be clear, such a change won't affect at all the
>>> > performance for normal use cases, while it will significantly speedup
>>> > and
>>> > simplify (via push* functions) the assembly of large matrices and
>>> > vectors.
>>> > So let's rather focus about implementing it inside Eigen.
>>> OK.
>>> > Regarding the question I asked about offering a 2D or 1D reserve, I
>>> > would
>>> > finally go for the 1D variant. The main reasons are:
>>> > - consistent with the behavior of MaxRows/MaxCols
>>> > - no stride => all Matrix objects keep the linear access flag
>>> > - easier to make it consistent with other resizing features
>>> >
>>> > Everybody agree with that?
>>> I didn't follow the 1D vs 2D debate but at least i don't object. And
>>> definitely, I want all Matrix objects to keep the linear access flag.
>>> >
>>> > Ok, now we need an additional m_allocatedSize integer in the class
>>> > Matrix
>>> > (only for dynamically sized matrices, but without any option to
>>> > enable/disable it, I want it to be the default and unique choice). Note
>>> > that
>>> > such a change implies an ABI change, and so it is important do discuss
>>> > it
>>> > now.
>>> Yes.
>>> > Moremover in order to be able to freely add new features in the future
>>> > without breaking the ABI, I would even add at least one other integer,
>>> > just
>>> > in case. For instance to have dynamic flags, or so. I'm not 100% sure
>>> > about
>>> > that, but why not?
>>> >
>>> > To add these extra attributes, I see two options:
>>> >
>>> > The standard one using ei_int_if_dynamic<>. con: increases
>>> > sizeof(MatrixXf)
>>> >
>>> > The second one is to put them at the begining of the dynamically
>>> > allocated
>>> > buffer. So more precisely, the allocation/dealocation would become:
>>> >
>>> > alloc:
>>> > m_data = ei_aligned_new(size+<16 bytes>) + <16 bytes>;
>>> > allocatedSize() = size;
>>> >
>>> > dealloc:
>>> > ei_aligned_delete(m_data-<16bytes>);
>>> >
>>> > int& allocatedSize() {return (m_data-<16bytes>);}
>>> >
>>> > Disclaimer: yes the above is not C++, it is just to picture the idea!
>>> This looks like going only halfway toward heap-stored data. Instead,
>>> why not take the bolder move of adding a d-pointer? We would put there
>>> any additional data that is OK to access with non-inline functions. So
>>> we would keep directly as data members the array pointer m_data and
>>> the dimensions m_rows and m_cols so we can still call
>>> rows()/cols()/data() at zero cost (useful as they are used all the
>>> time) but other less frequently used data could be deferred onto the
>>> d-pointer and accessed through no-inline accessors.
>> To be honest I don't see how adding a d-pointer can offer more flexibility.
> With your approach, any additional member data that we may want to add
> in the future, has to fit in the fixed number of bytes that were
> reserved, like 8 bytes or 16 bytes. We have to decide once and for all
> how much space we reserve for additional members. Moreover, once we've
> added a member, we have to keep its offset fixed forever. All of that
> can theoretically be overcome by using a d-pointer.
>> My proposal affords the same with less memory and runtime overhead: a true
>> d-pointer would requires in addition one pointer, one call to malloc,
> Yep, I thought about that just after sending the e-mail. The solution
> might be to merge this idea with your idea: allocate at once the
> matrix array and the D_structure. But in order to allow the
> D_structure to grow in the future, place it after the array, not
> before, and access it only with non-inline accessors .... now here's
> the catch... that must be compiled into a shared library :( I didn't
> think about that in my previous e-mail, but the d-pointer approach can
> only work if we have a binary shared library :( Though at that point,
> having such a tiny library would solve a bunch of problems at once
> (cache size parameters, etc). I don't know what to think about that.
>> and
>> the extra memory used by the system to manage the allocated memory.
> That also goes away once we unify the 2 malloc calls into one, see
> above (like in your proposal).
>> The main
>> concern is of course the additional malloc call per object. My proposal
>> requires only sizeof(D_structure) which must be a multiple of 16bytes to
>> preserve alignment.
> So, with my above updated proposal, this concern too goes away.
>> Actually, after thinking about it, I'm not sure we can really add new stuff
>> in what I called above the "D_structure" without breaking the ABI, or
>> without adding a binary library.
> Hehe, exactly what I didn't think of in my previous email :(
> See above.
>> Indeed, let's assume we have a library L
>> (e.g., an optimized solver based on Eigen and using Eigen's Matrix for its
>> public interface) and an application A which uses L via dynamic linkage. In
>> 3.1 we add a new variable in the D_structure of Matrix. Then the library L
>> is recompiled against Eigen 3.1 to get advantage of this shiny new feature.
>> Theoretically, the application A which was linked against Eigen 3.0 should
>> still work. And to make that happens we must make sure that Matrix objects
>> created by the application A are created with the Eigen 3.1 code to properly
>> create and initialize the D_structure. And achieve this I don't see other
>> solutions than having a "ei_create_and_initialize_D_structure()" function in
>> a binary (or .dll).
> Totally agree.
>>> > This approach has many advantages:
>>> > - sizeof(MatrixXf) remains minimal (good to pass/return MatrixXf by
>>> > value,
>>> > etc.)
>>> I don't understand this.
>>> If we pass/return a MatrixXf by value, the copy constructor gets call,
>>> which deep-copies the whole array anyway. So how good is it to have a
>>> smaller size on the stack? Do you have copy-on-write in mind?
>> I just wanted to emphasize that with this approach sizeof(MatrixXf) does not
>> increase, in case some people care. The example was bad indeed.
> ok.
>>> > - we have 12bytes free for future uses
>>> That we can have either way: on the heap (your approach) or on the stack.
>>> Here's another comment. Why do you count in bytes? We only need to
>>> preserve the ABI _within_ each given architecture, so we can define
>>> the ABI in terms of sizeof(int) and sizeof(void*). So instead of
>>> reserving, say, 16 bytes, i'd rather be talking in terms of
>>> "2*sizeof(int)+sizeof(void*)"
>> It's just that the offset must be a multiple of 16bytes to preserve
>> alignment. In practice, of course it is better to define a D_structure, and
>> take its size with some padding.
> First of all, regardless of whether we do a D pointer or not, I'd
> suggest putting the D structure after the matrix array, not before. At
> least that allows to add arbitrarily much member data in the future,
> instead of hardcoding a hard limit today.
> Then the alignment requirement on D is softened a lot, though it still
> wouldn't hurt to guarantee some alignment. Especially if we allow
> matrices with sizeof(Scalar) very small, the pointer at the end of the
> array could have very poor alignment. So why not 16 bytes, although
> unless we put SIMD stuff in the D structure 8 bytes should be enough.
>>> > And for the cons:
>>> > - well I cannot see any, maybe it is a bit hackish but who cares?
>>> Indeed there are not too many inconvenients. I'm just checking with
>>> you if there are advantages? :)
>> Indeed. It seems there might  advantages only if we add a binary library.
>> One more argument to add a binary.
> So you'd be in favor of adding a tiny binary library?
> I'm completely hesitating, I can't make a decision on that. I guess
> that if we treat this issue simultaneously with other issues that
> would benefit from a binary lib, such as cache size runtime
> parameters, then the case for a binary lib gets quite strong. On the
> other hand it will require good communication and documentation, it
> would be great to keep it optional (maybe make its code optionally
> available as a header file...), and it should be WTFPL-licensed.

ok so here's my feeling on the binary library issue.

I still don't think that we should add one:

 - on the issue being discussed here, I think that taking your
approach and instead putting the "D_structure" at the end (not before
the start), gives us quite a lot of flexibility already, basically we
can add arbitrarily many data members, we just can't change their
offsets as a d-pointer would allow. Of course, in order to address the
D_structure efficiently, we can store as a plain data member a pointer
to it (or its offset from the array pointer). So, no d-pointer but
still 90% of the flexibility.

- on the issue of runtime cache sizes, we still had several solutions
without a binary library, like providing a tiny .cpp file, etc.

No strong opinion... I just want to make sure that if we do introduce
a binary library, it's for a good reason... as it has a great cost in
terms of documentation, communication, and barrier to entry.


> Benoit

Mail converted by MHonArc 2.6.19+