Re: [eigen] Let's get reshape in shape for 3.4

On Fri, Jan 13, 2017 at 6:14 AM, Jason Newton <nevion@xxxxxxxxx> wrote:

Also, regarding them RowMajor/ColMajor int/type issue - perhaps stuff
them in a new namespace or class - storage ? Too bad StorageOrder is
already used in so many places. Honestly I'm all for you making them
types and things working uniformly from there. I have used them
myself as integers with the flags bitset, but only for enable_if logic
which would be rendered obsolete if you had a collection of C++11
inspired type traits (instead they get repeated on the web a few
places). Sorry if I'm not being very detailed, it's been a while
since I've needed these, but my point is that it was basically a flaw
to use them as int's in the first place, in user code - and so I
encourage you to change things so it all works fluidly in the new api
without fear of upsetting users. Although perhaps that is a daunting
task...

I think you are mixing Eigen::RowMajor with Eigen::RowMajorBit. I agree that the bit flags could be managed differently using individual type traits, but regarding Eigen::RowMajor, it is currently used as a template parameter to Matrix, Array, SparseMatrix, etc.:

Matrix<...., RowMajor|DontAlign>

which is pretty convenient to write compared to having to subclass some default_matrix_traits class to customize the options. With RowMajor|DontAlign, RowMajor could still be instance of an integral_constant-like type with operator | overloaded.... Actually I've started to think about such an approach for Eigen::Dynamic, so that one can write:

M = N*2+1

and get M==Dynamic if N==Dynamic. Currently we always have to write: M = N==Dynamic ? Dynamic : 2*N+1 which is error prone because it's easy to forget about checking for Dynamic, especially when combining multiple compile-time identifiers.

gael

-Jason

On Thu, Jan 12, 2017 at 11:56 PM, Jason Newton <nevion@xxxxxxxxx> wrote:
> Hi Gael,
>
> Glad to see all the new api's you're moving in for the new year.
>
> I actually prefer C if C is a superset of B - that is the way it works
> in Numpy - oder is overridable in several places, but mainly things
> follow the matrix types you are working with (which would be
> expressions here).
>
> I haven't thought about the details but is there any reason
> A.reshaped(4, n/2) work via constexprs or something on the 4? I
> imagine even if it did you're trying to cover for C++98 though, but I
> think fix<4> is a fair bit ugly.
>
> As for the placeholder for a solvable dimension - the matlab
> convension is the empty matrix and I welcome that notion (warped as a
> type) - how about any of, with no priorities:
> Null, Nil, Empty, Filled, DontCare, Placeholder, CalcSize (this and
> the next are more explicit), or AutoSized
>
>
> -Jason
>
> On Thu, Jan 12, 2017 at 10:35 AM, Gael Guennebaud
> <gael.guennebaud@xxxxxxxxxx> wrote:
>>
>> Hi everyone,
>>
>> just after generic indexing/slicing, another long standing missing feature
>> is reshape. So let's make it for 3.4.
>>
>> This is not the first time we discuss it. There is a old bug report entry
>> [1]. and a old pull-request with various discussions [2]. The Tensor module
>> also support reshape [3].
>>
>> However, the feature is still not there because we never converged about how
>> to properly handle the ambiguity between col-major / row-major orders, also
>> called Fortran versus C style orders (e.g., in numpy doc [4]).
>>
>> We have several options:
>>
>> A) Interpret the indices in column major only, regardless of the storage
>> order.
>> - used in MatLab and Armadillo
>> - pros: simple strategy
>> - cons: not very friendly for row-major inputs (needs to transpose twice)
>>
>> B) Follows the storage order of the given _expression_
>> - used by the Tensor module
>> - pros: easiest implementation
>> - cons:
>> * results depends on storage order (need to be careful in generic code)
>> * not all expressions have a natural storage order (e.g., a+a^T, a*b)
>> * needs a hard copy if, e.g., the user want to stack columns of a
>> row-major input
>>
>> C) Give the user an option to decide which order to use between: ColMajor,
>> RowMajor, Auto
>> - used by numpy [4] with default to RowMajor (aka C-like order)
>> - pros: give full control to the user
>> - cons: the API is a bit more complicated
>>
>> At this stage, option C) seems to be the only reasonable one. However, we
>> yet have to specify how to pass this option at compile-time, what Auto
>> means, and what is the default strategy.
>>
>> Regarding 'Auto', it is similar to option (B) above. However, as I already
>> mentioned, some expressions do not has any natural storage order. We could
>> address this issue by limiting the use of 'Auto' to expressions for which
>> the storage order is "strongly" defined, where "strong" could mean:
>> - Any expressions with the DirectAccessBit flags (it means we are dealing
>> with a Matrix, Map, sub-matrix, Ref, etc. but not with a generic _expression_)
>> - Any _expression_ with the LinearAccessBit flag: it means the _expression_ can
>> be efficiently processed as a 1D vector.
>>
>> Any other situation would raise a static_assert.
>>
>> But what if I really don't care and just want to, e.g., get a linear view
>> with no constraints of the stacking order? Then we could add a fourth option
>> meaning 'IDontCare', perhaps 'AnyOrder' ?
>>
>>
>> For the default behavior, I would propose 'ColMajor' which is perhaps the
>> most common and predictable choice given that the default storage is column
>> major too.
>>
>>
>> Then, for the API, nothing fancy (I use c++11 for brevity):
>>
>> template<typename RowsType=Index,typename ColType=Index,typename Order=Xxxx>
>> DenseBase::reshaped(RowsType rows,ColType cols,Order = Order());
>>
>> with one variant to output a 1D array/vector:
>>
>> template<typename Order= Xxxx >
>> DenseBase.reshaped(Order = Order());
>>
>> Note that I used "reshaped" with a "d" on purpose.
>>
>> The storage order of the resulting _expression_ would match the optional
>> order.
>>
>> Then for the name of the options we cannot use "RowMajor"/"ColMajor" because
>> they already are defined as "static const int" and we need objects with
>> different types here. Moreover, col-major/row-major does not extend well to
>> multi-dimension tensors. I also don't really like the reference to Fortran/C
>> as in numpy. "Forward"/"Backward" are confusing too. Any ideas?
>>
>> The rows/cols parameters could also be a mix of compile-time & runtime
>> values, like:
>>
>> A.reshaped(fix<4>,n/2);
>>
>> And maybe we could even allow a placeholder to automatically compute one of
>> the dimension to match the given matrix size. We cannot reuse "Auto" here
>> because that would be too confusing:
>>
>> A.reshaped(5,Auto);
>>
>> Again, any ideas for a good placeholder name? (numpy uses -1 but we need a
>> compile-time identifier)
>>
>>
>> cheers,
>>
>> gael
>>
>> [1] http://eigen.tuxfamily.org/bz/show_bug.cgi?id=437
>> [2] https://bitbucket.org/eigen/eigen/pull-requests/41
>> [3]
>> https://bitbucket.org/eigen/eigen/src/default/unsupported/Eigen/CXX11/src/Tensor/README.md?fileviewer=file-view-default#markdown-header-operation-reshapeconst-dimensions-new_dims
>> [4]
>> https://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.reshape.html