Re: [eigen] On a flexible API for submatrices, slicing, indexing, masking, etc.

[ Thread Index | Date Index | More Archives ]

Regarding the initial subject, I'm pretty satisfied with the current proposal of using:

range(start, stop);
range(start, stop, step);
range(start, stop , c<STOP>);

span(start, length);
span(start, length, step);
span(start, c<LENGTH>, c<STEP>);

​I have a few concerns regarding the syntax of c<>:

First, using c<> to encode a compile time parameter is a rather novel convention, foreign to most programmers, engineers, matlab users, or basically anyone.

Second, it begs the question of whether c<> can be used for any parameter of range and span.  People will then take a pause for every parameter of every invocation of range and span, to consider if they want a static or dynamic value in the hope of producing more optimized code. They might write things like range(c<1>, ncols, c<3>). I feel this adds noise to both thinking and notation, compared with just range(1, ncols, 3)

Third, I feel we need to carefully weigh the syntax burden against the optimization opportunity it produces. Hence we need to examine what is returned by invoking operator () on these range descriptors. It might be the case that a particular static parameter will produce minimally more optimized code, in which case can we justify paying the extra syntax tax?

Right now, only static matrix size plays a role in determining the type of the returned block. Static matrix size is also a parameter to the Eigen matrix type, so it is probably already internalized by the user. I hope we don't add additional rules on static parameters usage with the new API, unless there is a really good reason.

With these considerations, let me just throw around some more alternatives:

range(first, last, step = 1)
range<len>(first, last, step = 1)

span(first, len, step = 1)
span<len>(first, step = 1)

whereas if a parameter has a default value, it is also optional.


range(first, last)
range(first, last).step(10)

where making the step explicitly named avoids the confusion of whether it's (first, last, step) (i.e., numpy) or (first, step, last) (i.e., matlab).

In a parallel note, I would love for people to brainstorm more alternative names for range and span, since those are just about used everywhere, and the convention is all over the place. For example, armadillo's span takes in (first, last) instead of (first, len), and boost::range has an irange(first, last) function for integer range, and gsl::span takes in (first, len).

and forget about the "iota"-based API. What remains unclear to me is how to expose this compact 'c' function. I also have to generalize the proof-of concept demo to be sure that we can generalize it to multi-dimensional tensors.


I'm currently experimenting with an API like:

range(start,stop); // step==1
range(start,stop,step); // run-time step
range(start,stop,c<STEP>); // compile-time step

span(start,len); // step==1
span(start,len,step); // run-time step
span(start,len,c<STEP>); // compile-time step
span(start,c<LEN>); // compile-time length and step==1
span(start,c<LEN>,c<STEP>); // compile-time length and step
span(start,c<LEN>,step); // compile-time length and runtime step

And the usage remains the same, e.g.:

B = A(range(...), span(...));

Some remarks:

The key advantage here is that the argument order never change! For the "range" case, it would be ok to write range<STEP>(start,stop), but for the "span" case since the length needs also to be defined at compile-time this is unmanageable.

Another advantage compared to the demo on the wiki is that the "bounds-based" and "length-based" variants are similar, no odd API like the iota(len) stuff... Of course, this is also a drawback because there might be some naming confusions between 'range' versus 'span'. It might not be 100% obvious that one is based on 'bounds' and the other on a 'length', but here is the rationale:
- 'range' is (for me) more related to the notions of interval, limits, gamut, etc. that are naturally defined by their 'bounds'.
- 'span' is related to the notion of period of time, distance, width, extent, etc. and thus the notion of 'length' here.

Compared to the demo on the wiki page, here the 'step' is moved to the last argument. This is not matlab friendly, but in c/c++ optional arguments go last, so this makes more sense.

Another issue is that this approach is very compact only if we accept to define a Eigen::c and that the user import it in its current scope and use c++14 (perhaps c++11 with Yuanchen trick?). Otherwise it can become as verbose and unreadable as:

    Eigen::span(start, Eigen::Index_c<LEN>(), Eigen::Index_c<STEP>())

Finally, we also have to decide whether the 'stop' argument should be an inclusive or an exclusive upper bound... To figure this out, I'll prepare a set of examples to see what's the most convenient. My intuition is that even though we are used of the STL's exclusive 'end', an inclusive upper bound would be more symmetric with the inclusive lower bound, and thus indexing from the end should be easier...

OK, one more: with this approach we could easily enable compile-time start/stop with range to figure out the length at compile time:  range(c<START>, c<STOP>) , but I don't really see the needs for it as IMO if the size can be known at compile-time, then you probably better know it than the bounds, especially if you have to think about whether the upper bound is inclusive or exclusive.

What do you all think about it? 


Mail converted by MHonArc 2.6.19+