Hi everyone,
just after generic indexing/slicing, another long standing missing feature is reshape. So let's make it for 3.4.
This is not the first time we discuss it. There is a old bug report entry [1]. and a old pull-request with various discussions [2]. The Tensor module also support reshape [3].
However, the feature is still not there because we never converged about how to properly handle the ambiguity between col-major / row-major orders, also called Fortran versus C style orders (e.g., in numpy doc [4]).
We have several options:
A) Interpret the indices in column major only, regardless of the storage order.
- used in MatLab and Armadillo
- pros: simple strategy
- cons: not very friendly for row-major inputs (needs to transpose twice)
B) Follows the storage order of the given _expression_
- used by the Tensor module
- pros: easiest implementation
- cons:
* results depends on storage order (need to be careful in generic code)
* not all expressions have a natural storage order (e.g., a+a^T, a*b)
* needs a hard copy if, e.g., the user want to stack columns of a row-major input
C) Give the user an option to decide which order to use between: ColMajor, RowMajor, Auto
- used by numpy [4] with default to RowMajor (aka C-like order)
- pros: give full control to the user
- cons: the API is a bit more complicated
At this stage, option C) seems to be the only reasonable one. However, we yet have to specify how to pass this option at compile-time, what Auto means, and what is the default strategy.
Regarding 'Auto', it is similar to option (B) above. However, as I already mentioned, some expressions do not has any natural storage order. We could address this issue by limiting the use of 'Auto' to expressions for which the storage order is "strongly" defined, where "strong" could mean:
- Any expressions with the DirectAccessBit flags (it means we are dealing with a Matrix, Map, sub-matrix, Ref, etc. but not with a generic _expression_)
- Any _expression_ with the LinearAccessBit flag: it means the _expression_ can be efficiently processed as a 1D vector.
Any other situation would raise a static_assert.
But what if I really don't care and just want to, e.g., get a linear view with no constraints of the stacking order? Then we could add a fourth option meaning 'IDontCare', perhaps 'AnyOrder' ?
For the default behavior, I would propose 'ColMajor' which is perhaps the most common and predictable choice given that the default storage is column major too.
Then, for the API, nothing fancy (I use c++11 for brevity):
template<typename RowsType=Index,typename ColType=Index,typename Order=Xxxx>
DenseBase::reshaped(RowsType rows,ColType cols,Order = Order());
with one variant to output a 1D array/vector:
template<typename Order= Xxxx >
DenseBase.reshaped(Order = Order());
Note that I used "reshaped" with a "d" on purpose.
The storage order of the resulting _expression_ would match the optional order.
Then for the name of the options we cannot use "RowMajor"/"ColMajor" because they already are defined as "static const int" and we need objects with different types here. Moreover, col-major/row-major does not extend well to multi-dimension tensors. I also don't really like the reference to Fortran/C as in numpy. "Forward"/"Backward" are confusing too. Any ideas?
The rows/cols parameters could also be a mix of compile-time & runtime values, like:
A.reshaped(fix<4>,n/2);
And maybe we could even allow a placeholder to automatically compute one of the dimension to match the given matrix size. We cannot reuse "Auto" here because that would be too confusing:
A.reshaped(5,Auto);
Again, any ideas for a good placeholder name? (numpy uses -1 but we need a compile-time identifier)
cheers,
gael