Re: [eigen] [Review] Pull request 66, Huge Tensor module improvements

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]



The main purpose of the exercise for me was to develop a proof of concept for the .device() mechanism. It currently works quite well for coefficient wise operations such as tensor additions, but is fairly inefficient for operations such as contractions or convolution. I'll be making more changes in order to make it the evaluation of a contraction look more like the gebp matrix code.

See my comments below on Christian's points.

Benoit
 
1. C++03 compatibility emulation. I see that you went down that rabbit
   hole and came out alive and well, where I thought it would be far
   too complicated to emulate that with lots of variants of templates.
   Seeing your code, I actually see it's quite doable and not really
   that difficult. But I think there are some things that can be
   improved:

   a. Since Tensor/ does not necessarily need C++11 anymore, we should
      probably move it to unsupported/Eigen/Tensor. I don't see you
      implementing TensorSymmetry/ in C++03 however, so I think that
      will have to stay under unsupported/Eigen/CXX11/.
 
The emulation code is not perfect, and one gets better functionality with cxx11, so it might be worth keeping it explicit that there is a dependency on cxx11.

   b. Currently, since you detect the C++ version to decide which header
      to include, this breaks people (read: me ;-)) using
      TensorSymmetry/ if they accidentally compile it with C++ <= 03 -
      instead of getting a nice error message, they get lots of
      substitution errors, because the C++11 version of the header is
      not included.

      I don't think detection and emulation is the wrong way to go, but
      maybe the best thing is (after doing (1a)) to just write a header
      CXX11/CoreOrEmulated that includes CXX11/Core or CXX11/Emulated
      depending on the compiler version. Tensor/ would then include
      CXX11/CoreOrEmulated, but TensorSymmetry/ would include
      CXX11/Core.

      (Obviously, the tensor tests should then be named differently.)

Sounds reasonable. I'll update the code. 

   c. Eigen::array is a bad name decision for emulating std::array,
      since array kind-of-clashes with Eigen::Array - it's going to be
      hell for people using Eigen's traditional capital-A Array
      template if they accidentally have a typo.

      Maybe Eigen::std_array instead? With a macro EIGEN_STD_ARRAY that
      maps to either std::array (C++11) or Eigen::std_array?

      Btw.
      template <typename T, std::size_t N> using array =
          std::array<T, N>;
      That requires g++ 4.7 or 4.8, my code (even the crazy template
      stuff for TensorSymmetry) supports g++ 4.6 and I'd prefer to keep
      it that way. (Some older computing clusters still have only g++
      4.6 available.)

      Btw2. In array_apply_and_reduce etc. you use a for loop instead of
      recursion. From my experience, compilers do not automatically
      tend to unroll this loop, making it difficult for the optimizer
      to generate optimal code here. EIGEN_STRONG_INLINE + recursion is
      better in my experience.

   d. I don't like the Sizes<> template. I think it's better to just
      specify the sizes for fixed size tensors directly.
      Example:
          Tensor<double, 3, /* Options = */ 0, 3, 3, 3> epsilon;
          (3x3x3 tensor)
      Compare to:
          Matrix<double, 3, 3> foo;
          (3x3 matrix)
      Of course, we have to put Options before the variadic arguments
      here, so there is some difference, but I still think this is more
      consistent with the rest of Eigen here.
 
I don't like the Sizes template much either. At the time I couldn't find a way to emulate a variadic version of the TensorFixedSized template classes without having to rewrite a fair amount of code. Over time I moved most of that code to TensorBase and TensorDimensions so should now be possible to avoid the Sizes class entirely in the public api.
 
   e. Nitpicking: you support up to 5 indices in C++03, but IIRC Fortran
      supports 7, so maybe we should match that?

I'll need to go to 8 in the future ;)
 

   f. Unsure: enable_if could probably used instead of a lot of
      static asserts. OTOH, maybe using assertions is not the worst
      idea here. Unsure about the best course of action here..

2. GPU / Threading support: So you probably implemented this here,
   because you needed it yourself. I'm all for it, and the
   EIGEN_DEVICE_FUNC additions definitely should go in (they're in
   Eigen/Core after all). And while I think it's really cool what you
   did there, with .device() and all that, I think this is something
   that affects all of Eigen and I wouldn't want to add this to the
   Tensor module without first having a more general discussion about
   it.

   Btw. do you know how your own thread pool implementation compares to
   a simple openmp parallelization (-fopenmp with g++ and #pragma omp
   parallel for in front of the assignment loop for example)? I've
   never tried std::launch:async so far and I have no idea how much
   overhead it has.
 
The overhead of std::launch:async is implementation dependent. Creating threads can be really cheap on some systems. Ideally I'd use a real thread pool to avoid the need to create threads entirely, but I couldn't find a standard thread pool library.

 
3. General class structure:

   a. I don't like two classes TensorFixedSize and Tensor. I think
      there should be only one class, like Matrix or Vector.

I'd also like to see them merged. This would be cleaner and open the door to a mixed tensor for which some dimensions are known at compile time and some known dynamically.

   b. You only have one base class right now, which is fine for the
      features implemented so far, but I think at this stage we should
      really go and replicate the entire class hierarchy of Eigen,
      even if some subclasses are not really needed right now, just to
      make it easier later to add more features.

I don't think we need to replicate the entire eigen class hierarchy. The tensor class is a generalization of the eigen Array class, however I don't see an equivalent to the Eigen Matrix class in the tensor world. As a result, we don't need to introduce an equivalent to the Matrix, MatrixBase, DenseBase, and DenseCoeffsBase classes. If/when we add support for sparse tensors we'll need a common base class for the dense and sparse tensor hierarchies (similar to EigenBase), but this can wait. 
 

4. General observations:

   a. For all evaluators, you assume something like LinearAccessBit,
      which is fine as long as one doesn't implement some kind of
      generalized .block() for tensor and/or Maps with strides.

That's right. It will be tricky to figure out how to properly vectorize an _expression_ that contains calls to .block() and .reshape() correctly, so I punted on this completely. There are 2 possible non exclusive short term solutions:
  * disable all vectorization whenever a .block _expression_ is encountered, or whenever a tensor map is created with a non default stride.
  * Evaluate the .block() _expression_ immediately (when used in a rvalue _expression_) by making a dense copy that can be accessed linearly.
 

   b. The HasBlend / pblend() stuff is probably also something that
      could benefit the rest of Eigen, right? OTOH, the for loop over
      the packet size seems potentially inefficient to me (at least on
      CPUs) - can't you just do a packet() on m_condImpl and store that
      in the selector?
 
The downside of the pblend approach is that the then and else _expression_ always have to be evaluated. When using a packet of 4 there is a 12.5% chance that either can be completely skipped, so the non vectorized implementation of select might be faster in some case. Before using the pblend instruction in Eigen Core I'd like to update the evaluation cost model to reflect this.

   c. w.r.t. writePacket() - you've kind of answered the question I
      posed in the original posting of this thread, namely that it
      is only called from inside Eigen and therefore doesn't need
      variadic arguments.

   d. Nitpicking: Why did you change #include <...> to #include "..."
      for things that should be in the include path? The only things
      in "..." should be references to src/... from within the main
      include headers, at least if I follow the style of the rest of
      Eigen.
 
 #include "..." for non system includes seems to be the default in Eigen (at least based on the Eigen/Core header)

5. API stuff:

   a. "Tensor" class prefix. For the internal classes you obviously
      went the TensorXXX route, i.e. TensorEvaluator, TensorCwiseXXXOp
      etc. Since I raise this as a question in this thread, I thought
      about it some more and it's probably the easiest solution.

      That said, I really don't like names like TensorMap<> etc. that
      people will then use from the public API. Also, later on there's
      probably going to be something like Ref<> etc. I think it should
      be better to leverage the same names for the Tensor module, and I
      think I have an idea how to do that without making the rest of
      Eigen fall apart, but I have to try to see if it works first.
 
The Map code assumes a 2D layout in several places. For example, the notion of inner and outer strides assume there are exactly 2 strides. Similarly, referring to a coefficient can be done by row and column. On the other hands, n dimensional accessors are missing. Long term I'd like to see the Tensor code moved into Eigen core and the TensorMap merged with the Map class, but for the time being I thought it was safer to create a TensorMap class.

      Also, w.r.t. GPU/Threading: TensorDevice seems a really weird
      name for what it's doing. Probably best to create a module
      unsupported/Eigen/Parallelization, name the class ComputeDevice
      or something along those lines and make Tensor make use of that
      module.
 
Sounds good. 

   b. I don't like the API for convolutions and contractions. It took me
      a while to understand what you are doing. (I think without the
      tests I'd still not understand it.) OTOH, any better API is
      going to be a *LOT* more complicated to implement and it's
      probably better to have something that's useful instead of
      nothing. Still, I don't like "burning" names such as "convolve"
      or "contract" for this, because at some later point with a better
      API that could come to bite us.
 

--
Benoit


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/