[eigen] Re: [Blitz-devel] Fork Blitz++?

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

Hello Blitzers,

Thank you all for your input on this issue.  After much consideration, I think that Blitz++ is worth keeping around.  Even in 2016, it remains unique in concept and execution.

Blitz++ claims to do one thing only: provide multi-dimensional arrays for C++, similar to those in Fortran 90, Python/Numpy, and APL.  Notice that Fortran 2008 has this feature but nothing like STL, and is widely used for scientific computing.  It is clear that multi-dim arrays are essential in a way that STL is not.  Of course, a system offering both is even more useful...

It is therefore disappointing that such arrays are still not part of the C++ standard.  This is after 15 years of work on Blitz++, boost::multi_array, etc.  Without any standard, different libraries use different classes for the same data structure, limiting compatibility between them.  Interoperability is cumbersome, and often requires unnecessary copying of multi-dimensional data structures (assuming there's enough RAM).

Many of us confuse multi-dimensional arrays with matrices and vectors.  Matrices and vectors are essential to scientific computing, and may be the most common type of arrays used.  However, they do not take the place of the more general multi-dimensional array.  This becomes clear to anyone attempting to read 5-dimensional data out of a HDF/NetCDF file.  In an ideal world, matrix and vector classes (and operations) would be built upon a foundation of rank 1 and 2 arrays, thereby enhancing compatibility between different linear algebra libraries.

By offering only multi-dimensional arrays, Blitz++ provides a foundation upon which linear algebra and other libraries may be built.  The fact that none have taken advantage of this foundation seems to be a missed opportunity.  However... until/unless something better comes along, I believe we should continue to support Blitz++.

Why is Blitz++ so good?
  1. It works, it's stable, it's well-tested.  There's really nothing wrong with it.  Sometimes, software is updated infrequently because it's not buggy.  That's a reason to KEEP it, not throw it away.
  2. It's well documented.  The manual may feel long in the tooth compared to today's manuals.  But the information you need is all there, and it works.
  3. It does one thing well and has no dependencies.  I don't have to link to BLAS in order to read a NetCDF file with Blitz++.
  4. It's versatile.  Blitz++ offers the full functionality of Fortran 90 arrays, and is able to use arrays in memory allocated by others.  There are no restrictions on the dope vectors it allows.  It is therefore REALLY useful for interfacing Fortran and Python code with C++.  And since any array-like data structure can be converted (copy-free) to a blitz::Array, writing your functions to take blitz::Array parameters is an easy way to make them flexible as well.
  5. It's FAST.  Benchmarks have shown it to be about as fast as Fortran 90 arrays.
In my review, I found only two other serious efforts to offer multi-dimensional arrays for C++.  Both have potentially serious problems:

* Eigen::Tensor is a relatively new part of Eigen.  In theory, it offers most/all of Blitz++ functionality.  However, it has many downsides at this point compared to Blitz++:
  1. It's a less mature product, could be evolving, could come with bugs, etc.
  2. We don't know how fast it is.  It could be a lot slower than Blitz++.
  3. It's not as well documented.  It provides some features that could be useful, but that I just didn't know were there because I couldn't find them in the documentation.
  4. If the authors do a good job, they will end up with a library similar in functionality and speed to Blitz++.  I just don't see the upside to re-developing this functionality from scratch.
* boost::multi_array showed initial promise as being part of the boost ecosystem, and therefore (possibly) on track to becoming a C++ standard.  However, it has languished in boost, apparently with little or no interest.  This project is even deader than Blitz++.  Maybe one reason is benchmarks showed it to be about 1/2 as fast as Blitz++.  We must accept that people who want multi-dimensional arrays to be part of their language standard will have to use Fortran for many years to come:

"I don't know what the language of the year 2000 will look like, but I know it will be called Fortran." -- Tony Hoare [CarHoare], apparently on a card distributed during the 1982 AFIPS National Computing Conference.

A number of other libraries exist that provide specialized matrix and vector classes.  I do not need to mention them here because they don't provide multi-dimensional arrays.


I propose the following roadmap for the continued viability of Blitz++:
  1. We get volunteers.  Please contact me if you are interested in any of the tasks listed below.
  2. We move the repo to  github.com/oonumerics/blitz.  In the past, Blitz++ was hosted by oonumerics.  Once this new repo is set up, we remove Blitz++ code from SourceForge and direct people to GitHub.  We also move the mailing list and try to move the mailing list archives, if possible.
  3. Once things are moved, we can start putting updates into a ticket system.  I would suggest the following priorities:
    1. Reformatted documentation, posted on-line.  Doxygen docs would be nice.
    2. Identify and seek resolution on any warnings Blitz++ might be causing with modern compilers.  We need to assure users that Blitz++ won't just stop working some day.
    3. Release the results of this as "Blitz 1.0"
  4. We then start working on Blitz 2.0, which will make use of features in C++11 and beyond.  C++98 users can continue to use Blitz 1.0.  Updates might include:
    1. Replace TinyVector with std::array
    2. Much has changed now, with std::unique_ptr<> and std::shared_ptr<> now being part of the C++ standard.  It would be worth seeing if some of the shared_ptr functionality currently built into Blitz++ is worth  factoring out.
  5. Consider adding new functionality that will make blitz::Array more useful out-of-the-box.  I'm envisioning some stuff I've already written as little "utility" functions, but we'd want to do it right before putting it into the library.  Possibilities include:
    1. A reshape() function.
    2. Easy ways to read/write to NetCDF files.  (This would ONLY be compiled if Blitz++ is configured with NetCDF dependency.  We don't want to add any REQUIRED dependencies).
    3. Copy-free conversions between blitz::Array and std::vector.
    4. Conversions between blitz::Array, Numpy arrays and Fortran 90 arrays.
In any case, please share your comments and suggestions on this roadmap.  (Ane please volunteer too!)

Thank you,
-- Elizabeth

Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/