Re: [eigen] Malloc-free dynamic matrices

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


On 3/5/10, Eamon Nerbonne <eamon.nerbonne@xxxxxxxxx> wrote:
> I'm against adding extra fields to MatrixXf and would prefer avoiding binary
> library.

If some want MatrixXf to have memory caching/reservation and don't
mind extra flags; whilst others don't want extra flags -- what are the
reasons for not simply having a template type/policy called
MemoryAllocator which will be used by MatrixXf (similar to STL/::std
types using allocators)...

Any fancy memory reservation/caching is then relegated to the context
of the allocator type/policy...

.... and anyone who wants to have advanced
block-reallocation/caching/etc. may do:
MatrixXf<CustomSuperDuperAllocator...>
and live with extra members/overhead that this allocator implies;

whilst others may have MatrixXf<SmallBasicAllocator...> and not pay
the price of any extra things of a custom/fancy allocator.

The ABI, of course, may be an issue -- but STL/std code seems to use
this approach anyway... Besides header-only libs and single-definition
rule in C++... they kinda make me feel that, as Benoit mentioned, if
you gonna worry about ABI -- then might as well lock-it-in in stone
for a while (e.g. major ver. num of the lib)... just like any
interface or a protocol (e.g. IP) -- changing structural things is
done rarely and with great "transitional" latency :-)

.... may be I have missed something, or may be this is already how you
guys are doing it in Eigen, sorry -- it's not really that important to
me either way...

.... personally I'll be using Map<> et al quite frequently... if for no
other reason but to also be able to allocate/position multiple
distinct matricies (which are frequently used together in some
formula) in a continuous block of memory (i.e. tightly-located w.r.t.
each other -- one after another, thusly further optimizing the
CPU-cache utilization due to the fact that cpu's cacheline size is > 1
byte)... similar to reasons why frequently-used vars in structs are
usually grouped together as well...

anyway -- I'll shut up now. Keep up the good work and thanks for
writing a great library!

leon.

> The binary library is simply a simplicity issue; it's very nice not to have
> to add an extra build step.  Also, due to limitations of make, it sometimes
> causes things to break (say, when you change calling conventions in the
> makefile and the binary module isn't rebuilt since the source files haven't
> changed) and it's just an extra complexity that's nice to avoid.  If it's
> only necessary to permit ABI compatibility in the face of changes to eigen,
> I'd prefer to have the choice of losing the ABI guarantee and gaining the
> simplicity.
>
> The extra fields in MatrixXf will probably cause performance degradation for
> a few not-unreasonable use cases.  When using eigen, one might deal with
> small statically size matrices - and all is well.  If you're dealing with
> large dynamically sized matrices, the extra overhead won't matter.  In
> between, it will:
>
> If you're dealing with many matrices+vectors of small (but statically
> unknown) size, right now, the best option is plain MatrixXf.  It may seem
> that Matrix<float,Dynamic,Dynamic,0,16,16> is a reasonable alternative (and
> avoids dynamic allocation to boot!), but that approach reserves the full
> 16x16 matrix even for small cases; so if you're dealing with lots of small
> vectors and matrices, that's going to be huge (and impractical) memory
> inflation.  Also, even though the matrix actually reserves 256 elements, it
> will assert an error if presented with a 3x17 matrix (I haven't checked
> whether this actually causes a real error if the assert isn't on, probably
> not).  To use dynamic-size matrices with fixed storage right now, you'd need
> to have a healthy padding in both rows and cols.  Further, your code is
> suddenly less reusable - what if you encounter the exceptional case that
> exceeds the predicted limits?  If you just use MatrixXf, it works (though
> possibly slowly) - but if you use a fixed-size storage, it will silently
> corrupt in NDEBUG mode - not nice since it's likely to occur that
> *expects*variable sized input.
>
> So, I don't think it's a safe assumption that MatrixXf will not be used for
> small matrices - it's the only real option for small, variable-sized
> matrices, in fact (unless you have very few and memory inflation doesn't
> matter).
>
>
> --eamon@xxxxxxxxxxxx - Tel#:+31-6-15142163
>
>
> On Fri, Mar 5, 2010 at 09:24, Gael Guennebaud
> <gael.guennebaud@xxxxxxxxx>wrote:
>
>>
>>
>> On Fri, Mar 5, 2010 at 6:25 AM, Benoit Jacob
>> <jacob.benoit.1@xxxxxxxxx>wrote:
>>
>>> ok i can finally reply.....
>>>
>>> first of all let me say that my idea of
>>> d-pointers-without-a-shared-library fails because of the case when a
>>> program would first create a Eigen 3.0 matrix and then load (at
>>> runtime) a library that uses Eigen 3.1. In other words, it only works
>>> as long as any library loading predates any matrix creation...
>>>
>>> 2010/3/4 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>>> >
>>> > Ok, let me try to summarize. As far as I understood, the led us to two
>>> > options:
>>> >
>>> > Option1: we add a tiny shared library implementing the
>>> > creation/initialization of the D_structure and a method to get the
>>> > D_structure.
>>>
>>> Yes; nitpick: rather, offers methods to get the individual data
>>> members in the D_structure. Getting the d pointer itself is useless,
>>> precisely as you don't want to have to use hardcoded offsets.
>>>
>>> > Then where and how it is store is a detail. This approach
>>> > offers full flexibility in the future but requires a shared lib.
>>>
>>> Yes.
>>>
>>> >
>>> > Option2: if we don't want a shared lib, then there is no way we can
>>> change
>>> > the size of the D_structure and/or the way it is initialized.
>>>
>>> I'm afraid this is true, although I didn't realize this earlier.
>>>
>>> Initially I though that we could at least add more data members, that
>>> the problem was only that we couldn't remove/change existing members.
>>> But I realize I was wrong. One can't even add data members. That would
>>> already require putting them in a d-pointer.
>>>
>>> > So we can
>>> > still keep some flexibility (~10% ?) by deciding for 3.0 to reserve
>>> > some
>>> > extra bytes for future uses.
>>>
>>> I realize, now, why you were hardcoding a fixed amount of reserved
>>> space. It's indeed the only thing we can do in that direction.
>>>
>>> I'm just wondering what we're going to do with that, and if we're not
>>> going to be forced to write a separate class or Options template
>>> parameter anyway whenever we need to do anything serious...
>>>
>>> > Those bytes will be initialized to 0, and in
>>> > the future their default values must still be zero.
>>>
>>> hm i see. in case the old version creates a matrix that is then used
>>> by the new version.
>>>
>>> > In this case, there is
>>> > indeed no need to store them on the heap, and we can store them on the
>>> stack
>>> > as members of Matrix.
>>>
>>> yes.
>>>
>>> My conclusion:
>>>  - adding a shared library would be giving up a very large advantage
>>> that we have. Being purely headers makes it far easier for people to
>>> use Eigen.
>>>
>>
>> Adding -leigen is not far more complicated. It's only more complicated for
>> people following the devel branch because they have to do a "hg pull -u &&
>> cd build && make && make install && cd -". But ok, as long as we can
>> safely
>> workaround without relying on complex mechanisms, then it's definitely
>> better not to add a shared lib.
>>
>> Second degree: if we add a shared lib no more how do I compile Eigen ? I
>> run make but nothing happened ? ;)
>>
>>  - once allocatedSize member is added, i don't see any more potential
>>> reason on the horizon for changing the Matrix ABI, hence no potential
>>> use case for a d-pointer.
>>>  - if such a use case happens it's always possible to do that with a
>>> new template parameter / class.
>>>  - any fixed amount of reserved space can still fail to be enough the
>>> day we need it, while having a constant cost. Hence, I'm not
>>> convinced.
>>>
>>> So my opinion: Option2 without any reserved space, just set in stone
>>> the matrix ABI after you've implemented this change.
>>>
>>
>> I'd still add one integer flag. We can always add new flags while ensuring
>> that 0 is the default, and I already have a few ideas for such runtime
>> flags:
>> - block heap reallocation
>> - mark the object as temporary. E.g., when a user return a MatrixXf by
>> value => Matrix::operator= can use a cheap swap instead of a full copy:
>>
>> MatrixXf foo() { MatrixXf ret; /*...*/ ret.markTemporary(); return ret;}
>>
>> MatrixXf y;
>> /* ... */
>> y = foo();
>>
>> - etc.
>>
>> gael.
>>
>>
>>>
>>> Benoit
>>>
>>> >
>>> > gael.
>>> >
>>> >
>>> > On Thu, Mar 4, 2010 at 10:43 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx
>>> >
>>> > wrote:
>>> >>
>>> >> 2010/3/4 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>>> >> >> >> > alloc:
>>> >> >> >> > m_data = ei_aligned_new(size+<16 bytes>) + <16 bytes>;
>>> >> >> >> > allocatedSize() = size;
>>> >> >> >> >
>>> >> >> >> > dealloc:
>>> >> >> >> > ei_aligned_delete(m_data-<16bytes>);
>>> >> >> >> >
>>> >> >> >> > int& allocatedSize() {return (m_data-<16bytes>);}
>>> >> >> >> >
>>> >> >> >> > Disclaimer: yes the above is not C++, it is just to picture
>>> >> >> >> > the
>>> >> >> >> > idea!
>>> >> >> >>
>>> >> >> >> This looks like going only halfway toward heap-stored data.
>>> Instead,
>>> >> >> >> why not take the bolder move of adding a d-pointer? We would put
>>> >> >> >> there
>>> >> >> >> any additional data that is OK to access with non-inline
>>> functions.
>>> >> >> >> So
>>> >> >> >> we would keep directly as data members the array pointer m_data
>>> and
>>> >> >> >> the dimensions m_rows and m_cols so we can still call
>>> >> >> >> rows()/cols()/data() at zero cost (useful as they are used all
>>> the
>>> >> >> >> time) but other less frequently used data could be deferred onto
>>> the
>>> >> >> >> d-pointer and accessed through no-inline accessors.
>>> >> >> >
>>> >> >> > To be honest I don't see how adding a d-pointer can offer more
>>> >> >> > flexibility.
>>> >> >>
>>> >> >> With your approach, any additional member data that we may want to
>>> add
>>> >> >> in the future, has to fit in the fixed number of bytes that were
>>> >> >> reserved, like 8 bytes or 16 bytes. We have to decide once and for
>>> all
>>> >> >> how much space we reserve for additional members. Moreover, once
>>> we've
>>> >> >> added a member, we have to keep its offset fixed forever. All of
>>> that
>>> >> >> can theoretically be overcome by using a d-pointer.
>>> >> >
>>> >> > hm... maybe I've been  clear but as "my proposal" I was referring to
>>> the
>>> >> > solution of storing the D_structure in the dynamically allocated
>>> memory,
>>> >> > i.e., with the data.
>>> >>
>>> >> Well let's look at your pseudo code:
>>> >>
>>> >>  m_data = ei_aligned_new(size+<16 bytes>) + <16 bytes>;
>>> >>  allocatedSize() = size;
>>> >>
>>> >>  dealloc:
>>> >>  ei_aligned_delete(m_data-<16bytes>);
>>> >>
>>> >> If I understand correctly, you're reserving a fixed amount of memory
>>> >> (here 16 bytes) for the D_structure just before the location pointed
>>> >> to by m_data. So yes it's on the heap, that's what I understood, but
>>> >> you still hardcode the number of bytes that your D_structure may have.
>>> >>
>>> >> >> > My proposal affords the same with less memory and runtime
>>> overhead: a
>>> >> >> > true
>>> >> >> > d-pointer would requires in addition one pointer, one call to
>>> malloc,
>>> >> >>
>>> >> >> Yep, I thought about that just after sending the e-mail. The
>>> solution
>>> >> >> might be to merge this idea with your idea: allocate at once the
>>> >> >> matrix array and the D_structure. But in order to allow the
>>> >> >> D_structure to grow in the future, place it after the array, not
>>> >> >> before, and access it only with non-inline accessors .... now
>>> >> >> here's
>>> >> >> the catch... that must be compiled into a shared library :( I
>>> >> >> didn't
>>> >> >> think about that in my previous e-mail, but the d-pointer approach
>>> can
>>> >> >> only work if we have a binary shared library :( Though at that
>>> point,
>>> >> >> having such a tiny library would solve a bunch of problems at once
>>> >> >> (cache size parameters, etc). I don't know what to think about
>>> >> >> that.
>>> >> >
>>> >> > You cannot easily put it at the end because ideally we would store
>>> the
>>> >> > allocatedSize variable in the D_structure, and if you put it at the
>>> end
>>> >> > of
>>> >> > the data, you need the allocatedSize to access to the D_structure...
>>> >>
>>> >> Well yes, in my proposal of putting the D_structure at the end, we
>>> >> have to add a new data member to Matrix, which can be either the
>>> >> offset or why not directly the pointer to the D_structure. But I don't
>>> >> think that it should be the allocatedSize that we should store, and
>>> >> actually it still wouldn't be too convenient to address the
>>> >> D_structure (need to take padding into account...)
>>> >>
>>> >> Then, from the moment we're storing 2 pointers, m_data and m_d, it
>>> >> doesn't matter anymore which one is at the beginning and which one is
>>> >> at the end of the buffer.
>>> >>
>>> >> Is that a big deal to add one more data member to MatrixXf...?
>>> >>
>>> >> Though in that vein, one might go further and ask why we're preferring
>>> >> to put stuff on the heap at all, why not just add plain data members
>>> >> to MatrixXf...? I'm not sure why sizeof(MatrixXf) matters more than
>>> >> the size of the allocated buffer.
>>> >>
>>> >> > Since
>>> >> > this whole approach  can only work via a shared library,
>>> >>
>>> >> ...if we want a real d-pointer. Without a shared lib, we can still
>>> >> have a D_structure, it's just that the application using Eigen
>>> >> hardcodes the D_structure data layout at compile time, so we don't get
>>> >> the flexibility of a d-pointer.
>>> >>
>>> >> >> I'm completely hesitating, I can't make a decision on that. I guess
>>> >> >> that if we treat this issue simultaneously with other issues that
>>> >> >> would benefit from a binary lib, such as cache size runtime
>>> >> >> parameters, then the case for a binary lib gets quite strong. On
>>> >> >> the
>>> >> >> other hand it will require good communication and documentation, it
>>> >> >> would be great to keep it optional (maybe make its code optionally
>>> >> >> available as a header file...), and it should be WTFPL-licensed.
>>> >> >
>>> >> > Same here, though I become more and more in favor to a shared
>>> >> > library
>>> as
>>> >> > it
>>> >> > might solve many issues.
>>> >>
>>> >> i don't know... above we're discussing a very good solution without a
>>> >> binary lib, and below you have a great idea for the cache size problem
>>> >> too:
>>> >>
>>> >> >
>>> >> > Regarding runtime settings without a shared lib, I was thinking
>>> >> > about
>>> >> > using
>>> >> > a static variable inside a function:
>>> >> >
>>> >> > // internal
>>> >> > int manage_cache_size(enum action,int v=0)
>>> >> > {
>>> >> >  static int value = EIGEN_DEFAULT_CACHE_SIZE;
>>> >> >   if(action==set) value = v;
>>> >> >   if(action==get) return value;
>>> >> >   return value;
>>> >> >
>>> >> > }
>>> >> >
>>> >> > // public:
>>> >> > int cacheSize() { return manage_cache_size(get); }
>>> >> > void setCacheSize(int v) { manage_cache_size(set,v); }
>>> >> >
>>> >> > but I'm really unsure about that...
>>> >>
>>> >> wow, that looks like a great idea!
>>> >>
>>> >> Such a static variable in a function, works exactly like a global
>>> >> variable from a library as far as we're concerned... as far as I can
>>> >> see.
>>> >>
>>> >> Benoit
>>> >>
>>> >>
>>> >
>>> >
>>>
>>>
>>>
>>
>



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/