Re: [eigen] On a flexible API for submatrices, slicing, indexing, masking, etc.

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]




On Fri, Dec 30, 2016 at 8:51 PM, Yuanchen Zhu <yuanchen.zhu@xxxxxxxxxx> wrote:
- much more costly on the compiler (need to parse a char* using compile-time meta programming)

​I believe​ integer user literals are fed a long long instead of char* by the compiler.

not here because you cannot convert a constexpr to a template parameter. So the trick is to declare the user define literal _c like this:

template <char ...c>
constexpr auto operator"" _c()
{ return integral_constant<internal::parse<sizeof...(c)>({c...})>; }

  
A more subtle but more important question though, is whether the "stop" argument in range(start,stop) should be inclusive (=last, as in MatLab, symmetric with the start argument) or exclusive (=end, as in C++ STL). I'm tempted to prefer the inclusive variant but since the exclusive one is more standard in C++, I'm puzzled...

​Yeah. I think we should use (first, last) convention, since those are used by the technical computing​ community in general. Even boost::range's irange use the (first, last) convention. Also, when negative step comes into question, do you want to write range(n-1, -1, step<-1>) as opposed to the simpler range(n-1, 0, step<-1>), which is also symmetrical to range(0, n-1, step<1>).

I totally agree that the symmetry with the first index is a compelling argument. I essentially wanted to let the other option open for arguments as it is also very common: python, rust, GO, D, C++ range-v3, etc. all use an exclusive upper bound.
  
- The word "range" is overloaded much more than span. There's boost::range, the WIP range v3, and the "range-based" for loops. These big shots can have different conventions, some times [first, last) (range-based for), some times [begin, end) (boost's irange). For span, the only conflict is gsl::span right now (which should really be named array_view anyway, akin to std::string_view. It also takes in (base_ptr, length), so it's categorically different from our use).

gsl::span was initially called array_view, so I doubt they will change its name back. It is also interesting that gsl::span is defined as a "start"+"length" as in my initial distinction between range(start,stop) and span(start,size). Writing span(start,stop) does not sounds right to me.
 
- Thorough Eigen we use the word "size", so we should probably use Size<N>(n). Note the upper case "S" to convey that Size does not return the size of an object (as std::size), but rather constructs an object representing a size. Same for "step" -> "Step". This also reduces name collisions.

Yes I understand Eigen uses "size". I have some slight reservations regarding using size here. As a member function name, it's fine. But as a global function, it reminds me too much of std::size(), and even sizeof(). When I see Size(n), my first mental response is always that it refers to the size of n. 

I could argue the same way against length. length(n) should returns the length of n... Some API (GLSL, GLM) define length as the norm of a vector, so for a scalar length(n) would reduce to abs(n)...  This is why the uppercase is important: in Eigen it conveys that we are constructing an object, not that we are taking the size of something. For instance it could be implemented as:

template<int N=Dynamic> struct Size { ... };

with following usage:

Size<3>()
Size<>(n)
Size<N>(n)

that we trimmed with some hacks to:

Size<3>
Size(n)
Size<N>(n)

Here Size become a function, its an implementation detail, its essence did not change.
 
There is also the lesser issue where matlab's definition of size is [height, width], but the Eigen already adopted the C++ convention so this issue is moot. The word "len" conveys the idea that it is 1D, e.g., string::length.

string::length is a bad example as it is just an alias for string::size.
 
Alternatives are "count", "extent", "length" (spelled in full).  I used "len" since it is the shortest.

I actually liked the word "count" a lot, since it answers a small question at the back of my head: when step is >1, does the length parameter refer to the number of elements in the range, or the number of indices the range straddles across, i.e. (last - first + 1). "Count" unequivocally indicates the former, whereas "size", "extent", and "length" can be interpreted both ways.  Another example of API with similar distinction is GL/Vulkan, where "size" is usually size in bytes, "extent" refers to (last - first + 1), and "count" is the number of elements. 

So I'll use "count" from now on.

I would still argue for size. I don't see how it could be interpreted as a number of increments, in c++ it always refers to the number of elements (and sometimes the number of bytes for low level C API, but here it cannot be interpreted as bytes.)

In contrast, count is not used much to refer to the number of elements, so it could well be interpreted as the number steps/increments to take.

The length of a range could also refer to length=distance to travel from the first to the fast element, that is "last-first", whereas here we want "(last-first+step)/step".

Also, look at the following code snippet:

auto r = range(first, Xxxx(n), 2);

we do want that:

size(r)==n

so its quite consistent to call it Size. This is also one more argument for an upper-case as it cannot be written with full lower-case because it would then collide with an hypothetical Eigen::size() function.


Speaking about wording, I think that we should replace "step" by "incr" (shortcut for increment) that has much fewer definitions. For instance think about:

auto r = range(first, Size(n), Step<2>);

then r.nbSteps() would have nothing to do with Step<2> and it is ambiguous: does it returns the number of increments (=size-1) or the number of iterations (=size) to go through the sequence?

 
Regarding capitalization, I kind of feel that "step" and "count" should be in lower case. Mixing extra capital letters in a short range _expression_ adds syntax noises. An _expression_ like M(all, span(1, count(10)) looks cleaner than M(all, span(1, Count(10))) to me. Also "step" is likely implemented as an overloaded free-standing function, similar to span. Free-standing functions in Eigen starts with lower case by convention, right?   If you type "range" and "all" in lower case, why is "count" capitalized? Of course, you could argue that it is a special naming convention for "named argument", but I hate to memorize new conventions...

There are several arguments above.
 

Actually, this leads to another of my axe to grind. :) I so much prefer the STL/Boost naming convention where basically everything is in snake_case, except (future) concepts and template argument are in  PascalCase. Maybe add an suffix "_" for non-public member variables, and macros are of course still ALL_UPPERCASE, but that's all the rules. With a library that uses lots of generic programming, so many templated types are used as compile-time functions or variables. Distinguishing them as types using PascalCase just feels a bit wrong. Also, long descriptor identifier names (like MaxRowsAtCompileTime) look so much better and cleaner in snake case (max_rows_at_compile_time), especially when you come back to read old code months or years later. Anyway I can probably write an essay on this, so I'll stop the rambling (till Eigen 4.0 or std::eigen or boost::eigen, maybe? ;)

Eigen was born in KDE/Qt where the conventions are different. Its just a matter of habits.
 
Another random idea: it is possible, using operator= overloading, to have the following syntax for named parameter:

 M(all, span(1, count = 10))

See e.g., BOOST_PARAMETER_NAME.  I kind of like this more than M(all, span(1, count(10))), though in this case, we cannot use the same identifier for a static count, since count is a variable and count<C> won't compile. Maybe span(1, count_ = 10) and span(1, count<10>)?

no _ in the public API please.
 
- I think we could allow unnamed steps without much ambiguity:  range(i, j, s) and range(i, Size(n), s), in particular the second version is unambiguous.

​I totally agree. 

OK, then we basically have two possibilities (neglecting some obvious overloads regarding static/dynamic):

(A)
xxxx(first,last[,incr]);
xxxx(first,last,Zzzz<INCR>);
yyyy(first,size[,incr]);
yyyy(first,Zzzz<SIZE>,Zzzz<INCR>);

versus

(B)
 
uuuu(first,last[,incr]);
uuuu(first,last,Incr<INCR>);
uuuu(first,Vvvv(size)[,incr]);
uuuu(first,Vvvv<SIZE>,Wwww<INCR>);

In both options we have to figure-out three concise, explicit, and no polluting names...

A key advantage of option (A) is that we can use Zzzz<N> elsewhere in the API, but we have to find a nice shortcut for the notion of integral_constant, and two different names to denote the same notion of an arithmetic progression/sequence. I initially proposed xxxx==range and yyyy==span.

In contrast, a key advantage of option (B) is that we can use the same name to refer to an arithmetic progression/sequence, but we have to find and introduce a new name for each parameter that can be static. It also involves a weird API for the basic uuuu(first,Vvvv(size)) use case.


So at this stage of the discutions, if we could find a good name for a generic Zzzz<N>, then I would lean for option (A).


We may or may not want support static indices. If we do, i and j can be supplied as compile time constants using the user suffix syntax, e.g., 3_c, as you described before. Alternatively, we can have the rule that indices can be supplied as template parameters to span. Consider:

V(span<3, 6>)               vs V(span(3_c, 6_c))
V(span<3, 6>(step(s))       vs V(span(3_c, 6_c, step(s)))
V(span<3, 6>(step<2>))      vs V(span(3_c, 6_c, step<2>))
V(span<4>(len(n))           vs V(span(4_c, len(n))
V(span<6>(len<8>, step(s))) vs V(span(6_c, len<8>, step(s));

All except the first row look too noisy to me, so maybe we should just not support static indices...

If we go with the span/len/step (or rather range/Size/Step) approach and want compile-time indices, then the left column seems preferable to me. I see different use cases:

1- for sparse matrices, knowing that start=0 or start=last (with step=-1) is useful. For this we could imagine something else:

    range("",j)
    range(Eigen::last,j,Step<-1>)

   We need a Eigen::last or Eigen::end "keyword" anyway to index relatively from the end. Using "" as a default start=0 might sounds odd, but it has the advantage of not having to introduce one more "keyword" Eigen::zero or Eigen::first.

​I feel strongly that using "" to mean "begin" or "all" is a step towards obscurity. It's like those operator abusing math libraries where "&" means dot product and "^" means cross product. It feels clever when writing, but hampers compression when you come back to read it months later. Hell, if we had to commit that sin, I feel using the underscore "_" would be a better replacement for "all", i.e., M(_, i) = N(_, i).

"_" can only be defined through a macro, so not an option. You can sill write range("0",j) and v(":",range(i,j)) to remember, with possibly a runtime-check to make sure no one misuse it as range("3",j), but yes that's probably far fetched, I just wanted to give it a try.

gael
 

I currently don't have a better proposal if we want to support static index and dynamic count/step... Need to think about it. 

2 - Knowing the start at compile-time might also help to compute alignment, especially in the common case of start==0, which can be handled as in the previous point. Of course, in theory for that purpose we are rather interested in knowing whether start is a multiple of some power of two numbers... but that's too far fetched!

3 - Knowing both start and stop are known at compile-time to figure-out the size at compile-time, for this we only need:

       range<START,STOP>([optional step]);
   range<START,STOP>(start,stop[,optional step]);

   that would be aliases for range(start, Size<STOP-START(+1?)>(stop-start(+1?)) [,optional step]);


gael





Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/