Re: [eigen] [Review] Pull request 66, Huge Tensor module improvements |
[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]
1. C++03 compatibility emulation. I see that you went down that rabbit
hole and came out alive and well, where I thought it would be far
too complicated to emulate that with lots of variants of templates.
Seeing your code, I actually see it's quite doable and not really
that difficult. But I think there are some things that can be
improved:
a. Since Tensor/ does not necessarily need C++11 anymore, we should
probably move it to unsupported/Eigen/Tensor. I don't see you
implementing TensorSymmetry/ in C++03 however, so I think that
will have to stay under unsupported/Eigen/CXX11/.
b. Currently, since you detect the C++ version to decide which header
to include, this breaks people (read: me ;-)) using
TensorSymmetry/ if they accidentally compile it with C++ <= 03 -
instead of getting a nice error message, they get lots of
substitution errors, because the C++11 version of the header is
not included.
I don't think detection and emulation is the wrong way to go, but
maybe the best thing is (after doing (1a)) to just write a header
CXX11/CoreOrEmulated that includes CXX11/Core or CXX11/Emulated
depending on the compiler version. Tensor/ would then include
CXX11/CoreOrEmulated, but TensorSymmetry/ would include
CXX11/Core.
(Obviously, the tensor tests should then be named differently.)
c. Eigen::array is a bad name decision for emulating std::array,
since array kind-of-clashes with Eigen::Array - it's going to be
hell for people using Eigen's traditional capital-A Array
template if they accidentally have a typo.
Maybe Eigen::std_array instead? With a macro EIGEN_STD_ARRAY that
maps to either std::array (C++11) or Eigen::std_array?
Btw.
template <typename T, std::size_t N> using array =
std::array<T, N>;
That requires g++ 4.7 or 4.8, my code (even the crazy template
stuff for TensorSymmetry) supports g++ 4.6 and I'd prefer to keep
it that way. (Some older computing clusters still have only g++
4.6 available.)
Btw2. In array_apply_and_reduce etc. you use a for loop instead of
recursion. From my experience, compilers do not automatically
tend to unroll this loop, making it difficult for the optimizer
to generate optimal code here. EIGEN_STRONG_INLINE + recursion is
better in my experience.
d. I don't like the Sizes<> template. I think it's better to just
specify the sizes for fixed size tensors directly.
Example:
Tensor<double, 3, /* Options = */ 0, 3, 3, 3> epsilon;
(3x3x3 tensor)
Compare to:
Matrix<double, 3, 3> foo;
(3x3 matrix)
Of course, we have to put Options before the variadic arguments
here, so there is some difference, but I still think this is more
consistent with the rest of Eigen here.
e. Nitpicking: you support up to 5 indices in C++03, but IIRC Fortran
supports 7, so maybe we should match that?
f. Unsure: enable_if could probably used instead of a lot of
static asserts. OTOH, maybe using assertions is not the worst
idea here. Unsure about the best course of action here..
2. GPU / Threading support: So you probably implemented this here,
because you needed it yourself. I'm all for it, and the
EIGEN_DEVICE_FUNC additions definitely should go in (they're in
Eigen/Core after all). And while I think it's really cool what you
did there, with .device() and all that, I think this is something
that affects all of Eigen and I wouldn't want to add this to the
Tensor module without first having a more general discussion about
it.
Btw. do you know how your own thread pool implementation compares to
a simple openmp parallelization (-fopenmp with g++ and #pragma omp
parallel for in front of the assignment loop for example)? I've
never tried std::launch:async so far and I have no idea how much
overhead it has.
3. General class structure:
a. I don't like two classes TensorFixedSize and Tensor. I think
there should be only one class, like Matrix or Vector.
b. You only have one base class right now, which is fine for the
features implemented so far, but I think at this stage we should
really go and replicate the entire class hierarchy of Eigen,
even if some subclasses are not really needed right now, just to
make it easier later to add more features.
4. General observations:
a. For all evaluators, you assume something like LinearAccessBit,
which is fine as long as one doesn't implement some kind of
generalized .block() for tensor and/or Maps with strides.
b. The HasBlend / pblend() stuff is probably also something that
could benefit the rest of Eigen, right? OTOH, the for loop over
the packet size seems potentially inefficient to me (at least on
CPUs) - can't you just do a packet() on m_condImpl and store that
in the selector?
c. w.r.t. writePacket() - you've kind of answered the question I
posed in the original posting of this thread, namely that it
is only called from inside Eigen and therefore doesn't need
variadic arguments.
d. Nitpicking: Why did you change #include <...> to #include "..."
for things that should be in the include path? The only things
in "..." should be references to src/... from within the main
include headers, at least if I follow the style of the rest of
Eigen.
5. API stuff:
a. "Tensor" class prefix. For the internal classes you obviously
went the TensorXXX route, i.e. TensorEvaluator, TensorCwiseXXXOp
etc. Since I raise this as a question in this thread, I thought
about it some more and it's probably the easiest solution.
That said, I really don't like names like TensorMap<> etc. that
people will then use from the public API. Also, later on there's
probably going to be something like Ref<> etc. I think it should
be better to leverage the same names for the Tensor module, and I
think I have an idea how to do that without making the rest of
Eigen fall apart, but I have to try to see if it works first.
Also, w.r.t. GPU/Threading: TensorDevice seems a really weird
name for what it's doing. Probably best to create a module
unsupported/Eigen/Parallelization, name the class ComputeDevice
or something along those lines and make Tensor make use of that
module.
b. I don't like the API for convolutions and contractions. It took me
a while to understand what you are doing. (I think without the
tests I'd still not understand it.) OTOH, any better API is
going to be a *LOT* more complicated to implement and it's
probably better to have something that's useful instead of
nothing. Still, I don't like "burning" names such as "convolve"
or "contract" for this, because at some later point with a better
API that could come to bite us.
Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |