Re: [eigen] On a flexible API for submatrices, slicing, indexing, masking, etc.

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]




On Fri, Dec 30, 2016 at 2:54 AM, Yuanchen Zhu <yuanchen.zhu@xxxxxxxxxx> wrote:

 
First, using c<> to encode a compile time parameter is a rather novel convention, foreign to most programmers, engineers, matlab users, or basically anyone.

Yes, I'm also not 100% sure about it, but at least it has the great advantage of not disturbing the parameter orders. The implementation is also compatible with integral_constant defined with literals (using namespace boot::hana::literals): 3_c 

​Interesting. User literal changes the equation​. To me range(1, n, 3_c) does look cleaner than range(1, n, c<3>). And most importantly the name space of user literals is significantly less populated than that of types. I would even argue it should be the default and preferred notation.

The downside is of course it's still a new convention.

There are more downsides:
- c++14 only
- does not work with template parameters as in: template<class T> void foo() { span(i,c<T::Bar>); }
- much more costly on the compiler (need to parse a char* using compile-time meta programming)
- I think it's as unusual as c<3> as literals are mainly tailored to convey units. 

So it's just one possibility offered to user for free on Eigen's side, but that cannot be the default recommended approach!
 
Then, to be honest, I don't think we can define such a Eigen::c, as importing Eigen namespace would be too polluting. So we need to find another default name anyway. Some (not great) possibilities:

- IndexC<N>
- CIndex<N>
- CstIndex<N>
- CTIndex<N>
 
For the "start" argument, currently I don't really plan to allow for compile-time value, though the case start=0 is sometimes useful to figure-out alignment or for sparse matrices. This case is currently part of the Block API using specialized members (middleRows, leftCols, etc.)

​I think the user will expect a compile-time start/stop to be valid usage, especially since span allows TWO of its parameters to be static. It is going to be a real hassle to remember what can be static and which cannot.

OK, for more consistency we can easily accept compile-time value for each parameter.
 
Also, sometimes people may just want to provide a pair of compile-time (first, last), and get in return a compile time sized matrix, especially in unit tests. In that case, my order of preference is:

M(range<2, 10, 2>)          // pretty succinct

but does not work if not all parameters are compile-time values.
 
M(range(2_c, 10_c, 2_c))    // acceptable, especially if editor color code 2_c's as literals.
M(range(c<2>, c<10>, c<2>)) // Ugh! So much noise for so simple an _expression_

but so far, that's still the prettiest option that satisfies all constraints.
 
Still, when you read old code, or debug a hard to track memory bug, do you ever get that nagging suspicion that you have the order of last and step wrong somewhere? I know I do.

yes, I've already acknowledged that I'm never 100% sure about the argument order of LinSpaced and Constant, but I think that's mainly because there are different kind of optional arguments. For range/span, I think it's ok as there is only one optional argument.

A more subtle but more important question though, is whether the "stop" argument in range(start,stop) should be inclusive (=last, as in MatLab, symmetric with the start argument) or exclusive (=end, as in C++ STL). I'm tempted to prefer the inclusive variant but since the exclusive one is more standard in C++, I'm puzzled...
 
Actually, I just though about another case that is not covered by the current proposal, that is templated code where the length (and/or step) could be either a compile-time or run-time value. For instance, with the block API we have:

A.block<Rows,Cols>(i,j,rows=Rows,cols=Cols)

this way one may write:

A.block<Dynamic,ColsAtCompileTime>(i,j,rows,actual_cols)

with ColsAtCompileTime being either Eigen::Dynamic (=-1) or a positive compile-time integer. 

If we want to support this flexibility for both lengths and steps (and I think we do), then we first need to introduce a Eigen::DynamicStep=0 value because -1 is valid step whereas 0 is not, and then, following the current proposal, we could imagine something like:

span(start, c<LEN>(actual_len), c<STEP>(actual_step))

and let the span function figure-out whether the compile-time value is truly a compile-time value, or whether it should get the run-time value passed as argument.

Yeah I agree you probably need something like c<STEP>(actual_step) for generic code. Although I'd argue you should still allow c<STEP> and c(step) to work. 

yes, of course!
 
​Here is another related idea: Instead of using a generic c<N> to provide step or length, make both of them named parameters. This way we even remove artificial conceptual separation of range and span (which is yet another convention to memorize). Otherwise, when you see span(3, 10), you will always have a nagging suspicion: is this 3..10 or 3..12?

Here're some examples:

V(span(i, j))             // first..last
V(span(i, j, step(s)))    // dynamic step
V(span(i, j, step<S>))    // static step
V(span(i, j, step<S>(s))) // support both dynamic and static step

V(span(i, len(n)))           // dynamic span length
V(span(i, len<N>))           // static span length
V(span(i, len<N>(n))         // support both dynamic and static span length
V(span(i, len<N>, step(s)))  // static size, dynamic step
V(span(i, len(n), step<S>(s)))  // dynamic size, static/dynamic step.
...

We now have a simple consistent rule to remember: non-index parameters, len and step, are always named, and can be dynamic, or static, or both.  This syntax adds a little verbosity, but avoids lots of confusion and greatly improves readability, imo. 

At some points I've considered such an approach and did not pushed it too much because of verbosity, but after all, as we probably need a less intrusive name for c<N> anyway, this option seems worth to be thoroughly considered. Some remarks:

- you kept "span" (one less character?), but I think "range" is more intuitive (minor)

- Thorough Eigen we use the word "size", so we should probably use Size<N>(n). Note the upper case "S" to convey that Size does not return the size of an object (as std::size), but rather constructs an object representing a size. Same for "step" -> "Step". This also reduces name collisions.

- I think we could allow unnamed steps without much ambiguity:  range(i, j, s) and range(i, Size(n), s), in particular the second version is unambiguous.
 
We may or may not want support static indices. If we do, i and j can be supplied as compile time constants using the user suffix syntax, e.g., 3_c, as you described before. Alternatively, we can have the rule that indices can be supplied as template parameters to span. Consider:

V(span<3, 6>)               vs V(span(3_c, 6_c))
V(span<3, 6>(step(s))       vs V(span(3_c, 6_c, step(s)))
V(span<3, 6>(step<2>))      vs V(span(3_c, 6_c, step<2>))
V(span<4>(len(n))           vs V(span(4_c, len(n))
V(span<6>(len<8>, step(s))) vs V(span(6_c, len<8>, step(s));

All except the first row look too noisy to me, so maybe we should just not support static indices...

If we go with the span/len/step (or rather range/Size/Step) approach and want compile-time indices, then the left column seems preferable to me. I see different use cases:

1- for sparse matrices, knowing that start=0 or start=last (with step=-1) is useful. For this we could imagine something else:

    range("",j)
    range(Eigen::last,j,Step<-1>)

   We need a Eigen::last or Eigen::end "keyword" anyway to index relatively from the end. Using "" as a default start=0 might sounds odd, but it has the advantage of not having to introduce one more "keyword" Eigen::zero or Eigen::first.

2 - Knowing the start at compile-time might also help to compute alignment, especially in the common case of start==0, which can be handled as in the previous point. Of course, in theory for that purpose we are rather interested in knowing whether start is a multiple of some power of two numbers... but that's too far fetched!

3 - Knowing both start and stop are known at compile-time to figure-out the size at compile-time, for this we only need:

       range<START,STOP>([optional step]);
   range<START,STOP>(start,stop[,optional step]);

   that would be aliases for range(start, Size<STOP-START(+1?)>(stop-start(+1?)) [,optional step]);


gael



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/