Re: [eigen] Malloc-free dynamic matrices

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


On 3/5/10, leon zadorin <leonleon77@xxxxxxxxx> wrote:
> On 3/5/10, Eamon Nerbonne <eamon.nerbonne@xxxxxxxxx> wrote:
>> I'm against adding extra fields to MatrixXf and would prefer avoiding
>> binary
>> library.
>
> If some want MatrixXf to have memory caching/reservation and don't
> mind extra flags; whilst others don't want extra flags -- what are the
> reasons for not simply having a template type/policy called
> MemoryAllocator which will be used by MatrixXf (similar to STL/::std
> types using allocators)...
>
> Any fancy memory reservation/caching is then relegated to the context
> of the allocator type/policy...
>
> ... and anyone who wants to have advanced
> block-reallocation/caching/etc. may do:
> MatrixXf<CustomSuperDuperAllocator...>
> and live with extra members/overhead that this allocator implies;
>
> whilst others may have MatrixXf<SmallBasicAllocator...> and not pay
> the price of any extra things of a custom/fancy allocator.

One, last (I promise) suggestion: if allocator policy is to be used
and people really don't want any extra memory to be used by MatrixXf
with basic allocation, then instead of

template <typename Allocator>
struct Matrix {
Allocator a;
Matrix()
{
void *x = a.alloc();
}
};

it'd be better to do
template <typename Allocator>
struct Matrix : Allocator {
Matrix
{
void *x = Allocator::alloc();
}
};

Because, even if a given basic Allocator is an empty class (i.e. no
data members whatsoever), the 1st case will use extra 1 to 8 bytes of
memory (i.e. in C++ a complete object cannot have size 0), but in
second case an EBO (empty-base-optimization) will be automatically
used by the compiler to guarantee that 0 extra bytes are associated
with an given empty (e.g. all static methods) Allocator sub-type
-object.

Anyway... it's just for the extra-pedantic -paranoid ones out there :-)

I shall be silent now :-)

Kind regards
Leon.

> The ABI, of course, may be an issue -- but STL/std code seems to use
> this approach anyway... Besides header-only libs and single-definition
> rule in C++... they kinda make me feel that, as Benoit mentioned, if
> you gonna worry about ABI -- then might as well lock-it-in in stone
> for a while (e.g. major ver. num of the lib)... just like any
> interface or a protocol (e.g. IP) -- changing structural things is
> done rarely and with great "transitional" latency :-)
>
> ... may be I have missed something, or may be this is already how you
> guys are doing it in Eigen, sorry -- it's not really that important to
> me either way...
>
> ... personally I'll be using Map<> et al quite frequently... if for no
> other reason but to also be able to allocate/position multiple
> distinct matricies (which are frequently used together in some
> formula) in a continuous block of memory (i.e. tightly-located w.r.t.
> each other -- one after another, thusly further optimizing the
> CPU-cache utilization due to the fact that cpu's cacheline size is > 1
> byte)... similar to reasons why frequently-used vars in structs are
> usually grouped together as well...
>
> anyway -- I'll shut up now. Keep up the good work and thanks for
> writing a great library!
>
> leon.
>
>> The binary library is simply a simplicity issue; it's very nice not to
>> have
>> to add an extra build step.  Also, due to limitations of make, it
>> sometimes
>> causes things to break (say, when you change calling conventions in the
>> makefile and the binary module isn't rebuilt since the source files
>> haven't
>> changed) and it's just an extra complexity that's nice to avoid.  If it's
>> only necessary to permit ABI compatibility in the face of changes to
>> eigen,
>> I'd prefer to have the choice of losing the ABI guarantee and gaining the
>> simplicity.
>>
>> The extra fields in MatrixXf will probably cause performance degradation
>> for
>> a few not-unreasonable use cases.  When using eigen, one might deal with
>> small statically size matrices - and all is well.  If you're dealing with
>> large dynamically sized matrices, the extra overhead won't matter.  In
>> between, it will:
>>
>> If you're dealing with many matrices+vectors of small (but statically
>> unknown) size, right now, the best option is plain MatrixXf.  It may seem
>> that Matrix<float,Dynamic,Dynamic,0,16,16> is a reasonable alternative
>> (and
>> avoids dynamic allocation to boot!), but that approach reserves the full
>> 16x16 matrix even for small cases; so if you're dealing with lots of
>> small
>> vectors and matrices, that's going to be huge (and impractical) memory
>> inflation.  Also, even though the matrix actually reserves 256 elements,
>> it
>> will assert an error if presented with a 3x17 matrix (I haven't checked
>> whether this actually causes a real error if the assert isn't on,
>> probably
>> not).  To use dynamic-size matrices with fixed storage right now, you'd
>> need
>> to have a healthy padding in both rows and cols.  Further, your code is
>> suddenly less reusable - what if you encounter the exceptional case that
>> exceeds the predicted limits?  If you just use MatrixXf, it works (though
>> possibly slowly) - but if you use a fixed-size storage, it will silently
>> corrupt in NDEBUG mode - not nice since it's likely to occur that
>> *expects*variable sized input.
>>
>> So, I don't think it's a safe assumption that MatrixXf will not be used
>> for
>> small matrices - it's the only real option for small, variable-sized
>> matrices, in fact (unless you have very few and memory inflation doesn't
>> matter).
>>
>>
>> --eamon@xxxxxxxxxxxx - Tel#:+31-6-15142163
>>
>>
>> On Fri, Mar 5, 2010 at 09:24, Gael Guennebaud
>> <gael.guennebaud@xxxxxxxxx>wrote:
>>
>>>
>>>
>>> On Fri, Mar 5, 2010 at 6:25 AM, Benoit Jacob
>>> <jacob.benoit.1@xxxxxxxxx>wrote:
>>>
>>>> ok i can finally reply.....
>>>>
>>>> first of all let me say that my idea of
>>>> d-pointers-without-a-shared-library fails because of the case when a
>>>> program would first create a Eigen 3.0 matrix and then load (at
>>>> runtime) a library that uses Eigen 3.1. In other words, it only works
>>>> as long as any library loading predates any matrix creation...
>>>>
>>>> 2010/3/4 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>>>> >
>>>> > Ok, let me try to summarize. As far as I understood, the led us to
>>>> > two
>>>> > options:
>>>> >
>>>> > Option1: we add a tiny shared library implementing the
>>>> > creation/initialization of the D_structure and a method to get the
>>>> > D_structure.
>>>>
>>>> Yes; nitpick: rather, offers methods to get the individual data
>>>> members in the D_structure. Getting the d pointer itself is useless,
>>>> precisely as you don't want to have to use hardcoded offsets.
>>>>
>>>> > Then where and how it is store is a detail. This approach
>>>> > offers full flexibility in the future but requires a shared lib.
>>>>
>>>> Yes.
>>>>
>>>> >
>>>> > Option2: if we don't want a shared lib, then there is no way we can
>>>> change
>>>> > the size of the D_structure and/or the way it is initialized.
>>>>
>>>> I'm afraid this is true, although I didn't realize this earlier.
>>>>
>>>> Initially I though that we could at least add more data members, that
>>>> the problem was only that we couldn't remove/change existing members.
>>>> But I realize I was wrong. One can't even add data members. That would
>>>> already require putting them in a d-pointer.
>>>>
>>>> > So we can
>>>> > still keep some flexibility (~10% ?) by deciding for 3.0 to reserve
>>>> > some
>>>> > extra bytes for future uses.
>>>>
>>>> I realize, now, why you were hardcoding a fixed amount of reserved
>>>> space. It's indeed the only thing we can do in that direction.
>>>>
>>>> I'm just wondering what we're going to do with that, and if we're not
>>>> going to be forced to write a separate class or Options template
>>>> parameter anyway whenever we need to do anything serious...
>>>>
>>>> > Those bytes will be initialized to 0, and in
>>>> > the future their default values must still be zero.
>>>>
>>>> hm i see. in case the old version creates a matrix that is then used
>>>> by the new version.
>>>>
>>>> > In this case, there is
>>>> > indeed no need to store them on the heap, and we can store them on
>>>> > the
>>>> stack
>>>> > as members of Matrix.
>>>>
>>>> yes.
>>>>
>>>> My conclusion:
>>>>  - adding a shared library would be giving up a very large advantage
>>>> that we have. Being purely headers makes it far easier for people to
>>>> use Eigen.
>>>>
>>>
>>> Adding -leigen is not far more complicated. It's only more complicated
>>> for
>>> people following the devel branch because they have to do a "hg pull -u
>>> &&
>>> cd build && make && make install && cd -". But ok, as long as we can
>>> safely
>>> workaround without relying on complex mechanisms, then it's definitely
>>> better not to add a shared lib.
>>>
>>> Second degree: if we add a shared lib no more how do I compile Eigen ? I
>>> run make but nothing happened ? ;)
>>>
>>>  - once allocatedSize member is added, i don't see any more potential
>>>> reason on the horizon for changing the Matrix ABI, hence no potential
>>>> use case for a d-pointer.
>>>>  - if such a use case happens it's always possible to do that with a
>>>> new template parameter / class.
>>>>  - any fixed amount of reserved space can still fail to be enough the
>>>> day we need it, while having a constant cost. Hence, I'm not
>>>> convinced.
>>>>
>>>> So my opinion: Option2 without any reserved space, just set in stone
>>>> the matrix ABI after you've implemented this change.
>>>>
>>>
>>> I'd still add one integer flag. We can always add new flags while
>>> ensuring
>>> that 0 is the default, and I already have a few ideas for such runtime
>>> flags:
>>> - block heap reallocation
>>> - mark the object as temporary. E.g., when a user return a MatrixXf by
>>> value => Matrix::operator= can use a cheap swap instead of a full copy:
>>>
>>> MatrixXf foo() { MatrixXf ret; /*...*/ ret.markTemporary(); return ret;}
>>>
>>> MatrixXf y;
>>> /* ... */
>>> y = foo();
>>>
>>> - etc.
>>>
>>> gael.
>>>
>>>
>>>>
>>>> Benoit
>>>>
>>>> >
>>>> > gael.
>>>> >
>>>> >
>>>> > On Thu, Mar 4, 2010 at 10:43 PM, Benoit Jacob
>>>> > <jacob.benoit.1@xxxxxxxxx
>>>> >
>>>> > wrote:
>>>> >>
>>>> >> 2010/3/4 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>>>> >> >> >> > alloc:
>>>> >> >> >> > m_data = ei_aligned_new(size+<16 bytes>) + <16 bytes>;
>>>> >> >> >> > allocatedSize() = size;
>>>> >> >> >> >
>>>> >> >> >> > dealloc:
>>>> >> >> >> > ei_aligned_delete(m_data-<16bytes>);
>>>> >> >> >> >
>>>> >> >> >> > int& allocatedSize() {return (m_data-<16bytes>);}
>>>> >> >> >> >
>>>> >> >> >> > Disclaimer: yes the above is not C++, it is just to picture
>>>> >> >> >> > the
>>>> >> >> >> > idea!
>>>> >> >> >>
>>>> >> >> >> This looks like going only halfway toward heap-stored data.
>>>> Instead,
>>>> >> >> >> why not take the bolder move of adding a d-pointer? We would
>>>> >> >> >> put
>>>> >> >> >> there
>>>> >> >> >> any additional data that is OK to access with non-inline
>>>> functions.
>>>> >> >> >> So
>>>> >> >> >> we would keep directly as data members the array pointer
>>>> >> >> >> m_data
>>>> and
>>>> >> >> >> the dimensions m_rows and m_cols so we can still call
>>>> >> >> >> rows()/cols()/data() at zero cost (useful as they are used all
>>>> the
>>>> >> >> >> time) but other less frequently used data could be deferred
>>>> >> >> >> onto
>>>> the
>>>> >> >> >> d-pointer and accessed through no-inline accessors.
>>>> >> >> >
>>>> >> >> > To be honest I don't see how adding a d-pointer can offer more
>>>> >> >> > flexibility.
>>>> >> >>
>>>> >> >> With your approach, any additional member data that we may want
>>>> >> >> to
>>>> add
>>>> >> >> in the future, has to fit in the fixed number of bytes that were
>>>> >> >> reserved, like 8 bytes or 16 bytes. We have to decide once and
>>>> >> >> for
>>>> all
>>>> >> >> how much space we reserve for additional members. Moreover, once
>>>> we've
>>>> >> >> added a member, we have to keep its offset fixed forever. All of
>>>> that
>>>> >> >> can theoretically be overcome by using a d-pointer.
>>>> >> >
>>>> >> > hm... maybe I've been  clear but as "my proposal" I was referring
>>>> >> > to
>>>> the
>>>> >> > solution of storing the D_structure in the dynamically allocated
>>>> memory,
>>>> >> > i.e., with the data.
>>>> >>
>>>> >> Well let's look at your pseudo code:
>>>> >>
>>>> >>  m_data = ei_aligned_new(size+<16 bytes>) + <16 bytes>;
>>>> >>  allocatedSize() = size;
>>>> >>
>>>> >>  dealloc:
>>>> >>  ei_aligned_delete(m_data-<16bytes>);
>>>> >>
>>>> >> If I understand correctly, you're reserving a fixed amount of memory
>>>> >> (here 16 bytes) for the D_structure just before the location pointed
>>>> >> to by m_data. So yes it's on the heap, that's what I understood, but
>>>> >> you still hardcode the number of bytes that your D_structure may
>>>> >> have.
>>>> >>
>>>> >> >> > My proposal affords the same with less memory and runtime
>>>> overhead: a
>>>> >> >> > true
>>>> >> >> > d-pointer would requires in addition one pointer, one call to
>>>> malloc,
>>>> >> >>
>>>> >> >> Yep, I thought about that just after sending the e-mail. The
>>>> solution
>>>> >> >> might be to merge this idea with your idea: allocate at once the
>>>> >> >> matrix array and the D_structure. But in order to allow the
>>>> >> >> D_structure to grow in the future, place it after the array, not
>>>> >> >> before, and access it only with non-inline accessors .... now
>>>> >> >> here's
>>>> >> >> the catch... that must be compiled into a shared library :( I
>>>> >> >> didn't
>>>> >> >> think about that in my previous e-mail, but the d-pointer
>>>> >> >> approach
>>>> can
>>>> >> >> only work if we have a binary shared library :( Though at that
>>>> point,
>>>> >> >> having such a tiny library would solve a bunch of problems at
>>>> >> >> once
>>>> >> >> (cache size parameters, etc). I don't know what to think about
>>>> >> >> that.
>>>> >> >
>>>> >> > You cannot easily put it at the end because ideally we would store
>>>> the
>>>> >> > allocatedSize variable in the D_structure, and if you put it at
>>>> >> > the
>>>> end
>>>> >> > of
>>>> >> > the data, you need the allocatedSize to access to the
>>>> >> > D_structure...
>>>> >>
>>>> >> Well yes, in my proposal of putting the D_structure at the end, we
>>>> >> have to add a new data member to Matrix, which can be either the
>>>> >> offset or why not directly the pointer to the D_structure. But I
>>>> >> don't
>>>> >> think that it should be the allocatedSize that we should store, and
>>>> >> actually it still wouldn't be too convenient to address the
>>>> >> D_structure (need to take padding into account...)
>>>> >>
>>>> >> Then, from the moment we're storing 2 pointers, m_data and m_d, it
>>>> >> doesn't matter anymore which one is at the beginning and which one
>>>> >> is
>>>> >> at the end of the buffer.
>>>> >>
>>>> >> Is that a big deal to add one more data member to MatrixXf...?
>>>> >>
>>>> >> Though in that vein, one might go further and ask why we're
>>>> >> preferring
>>>> >> to put stuff on the heap at all, why not just add plain data members
>>>> >> to MatrixXf...? I'm not sure why sizeof(MatrixXf) matters more than
>>>> >> the size of the allocated buffer.
>>>> >>
>>>> >> > Since
>>>> >> > this whole approach  can only work via a shared library,
>>>> >>
>>>> >> ...if we want a real d-pointer. Without a shared lib, we can still
>>>> >> have a D_structure, it's just that the application using Eigen
>>>> >> hardcodes the D_structure data layout at compile time, so we don't
>>>> >> get
>>>> >> the flexibility of a d-pointer.
>>>> >>
>>>> >> >> I'm completely hesitating, I can't make a decision on that. I
>>>> >> >> guess
>>>> >> >> that if we treat this issue simultaneously with other issues that
>>>> >> >> would benefit from a binary lib, such as cache size runtime
>>>> >> >> parameters, then the case for a binary lib gets quite strong. On
>>>> >> >> the
>>>> >> >> other hand it will require good communication and documentation,
>>>> >> >> it
>>>> >> >> would be great to keep it optional (maybe make its code
>>>> >> >> optionally
>>>> >> >> available as a header file...), and it should be WTFPL-licensed.
>>>> >> >
>>>> >> > Same here, though I become more and more in favor to a shared
>>>> >> > library
>>>> as
>>>> >> > it
>>>> >> > might solve many issues.
>>>> >>
>>>> >> i don't know... above we're discussing a very good solution without
>>>> >> a
>>>> >> binary lib, and below you have a great idea for the cache size
>>>> >> problem
>>>> >> too:
>>>> >>
>>>> >> >
>>>> >> > Regarding runtime settings without a shared lib, I was thinking
>>>> >> > about
>>>> >> > using
>>>> >> > a static variable inside a function:
>>>> >> >
>>>> >> > // internal
>>>> >> > int manage_cache_size(enum action,int v=0)
>>>> >> > {
>>>> >> >  static int value = EIGEN_DEFAULT_CACHE_SIZE;
>>>> >> >   if(action==set) value = v;
>>>> >> >   if(action==get) return value;
>>>> >> >   return value;
>>>> >> >
>>>> >> > }
>>>> >> >
>>>> >> > // public:
>>>> >> > int cacheSize() { return manage_cache_size(get); }
>>>> >> > void setCacheSize(int v) { manage_cache_size(set,v); }
>>>> >> >
>>>> >> > but I'm really unsure about that...
>>>> >>
>>>> >> wow, that looks like a great idea!
>>>> >>
>>>> >> Such a static variable in a function, works exactly like a global
>>>> >> variable from a library as far as we're concerned... as far as I can
>>>> >> see.
>>>> >>
>>>> >> Benoit
>>>> >>
>>>> >>
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>>
>>
>



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/