The Apache Mahout PMC is pleased to announce the release of Mahout 0.11.0.
Mahout's goal is to create an environment for quickly creating machine learning applications
that scale and run on the highest performance parallel computation engines available. Mahout
comprises an interactive environment and library that supports generalized scalable linear
algebra and includes many modern machine learning algorithms.
The Mahout Math environment we call “Samsara” for its symbol of universal renewal. It
reflects a fundamental rethinking of how scalable machine learning algorithms are built and
customized. MahoutSamsara is here to help people create their own math while providing some
offtheshelf algorithm implementations. At its base are general linear algebra and statistical
operations along with the data structures to support them. It’s written in Scala with Mahoutspecific
extensions, and runs most fully on Spark.
To get started with Apache Mahout 0.11.0, download the release artifacts and signatures from
http://www.apache.org/dist/mahout/0.11.0/.
Many thanks to the contributors and committers who were part of this release. Please see below
for the Release Highlights.
RELEASE HIGHLIGHTS
This is a minor release over Mahout 0.10.0 meant to introduce several new features and to
fix some bugs. Mahout 0.11.0 includes all new features and bugfixes released in Mahout versions
0.10.1, and 0.10.2.
Mahout 0.11.0 new features compared to Mahout 0.10.0
1. Spark 1.3 support.
2. Incore transpose view rewrites. Modifiable transpose views eg. (for (col < a.t)
col := 5).
3. Performance and parallelization improvements for AB', A'B, A'A spark physical operators.
This speeds SimilarityAnalysis and it’s associated jobs, sparkitemsimilarity and sparkrowsimilarity.
4. Optional structural "flavor" abstraction for incore matrices. Incore matrices can
now be tagged as e.g. sparse or dense.
5. %*% optimization based on matrix flavors.
6. Incore ::= sparse assignment functions.
7. Assign := optimization (do proper traversal based on matrix flavors, similarly to %*%).
8. Adding inplace elementwise functional assignment (e.g. mxA := exp _, mxA ::= exp _).
9. Distributed and incore version of simple elementwise analogues of scala.math._. for
example, for log(x) the convention is dlog(drm), mlog(mx), vlog(vec). Unfortunately we cannot
overload these functions over what is done in scala.math, i.e. scala would not allow log(mx)
or log(drm) and log(Double) at the same time, mainly because they are being defined in different
packages.
10. Distributed and incore first and second moment routines. R analogs: mean(), colMeans(),
rowMeans(), variance(), sd(). By convention, distributed versions are prepended by (d) letter:
colMeanVars() colMeanStdevs() dcolMeanVars() dcolMeanStdevs().
11. Distance and squared distance matrix routines. R analog: dist(). Provide both squared
and nonsquared eucledian distance matrices. By convention, distributed versions are prepended
by (d) letter: dist(x), sqDist(x), dsqDist(x). Also a variation for pairwise distance matrix
of two different inputs x and y: sqDist(x,y), dsqDist(x,y).
12. DRM row sampling api.
13. Distributed performance bug fixes. This relates mostly to (a) matrix multiplication
deficiencies, and (b) handling parallelism.
14. Distributed engine neutral allreduceBlock() operator api for Spark and H2O.
15. Distributed optimizer operators for elementwise functions. Rewrites recognizing e.g.
1+ drmX * dexp(drmX) as a single fused elementwise physical operator: elementwiseFunc(f1(f2(drmX))
where f1 = 1 + x and f2 = exp(x).
16. More cbind, rbind flavors (e.g. 1 cbind mxX, 1 cbind drmX or the other way around) for
Spark and H2O.
17. Added +=: and *=: operators on vectors.
18. Closeable API for broadcast tensors.
19. Support for conversion of any typekeyed DRM into ordinallykeyed DRM.
20. Scala logging style.
21. rowSumsMap() summary for nonintkeyed DRMs.
22. elementwise power operator ^ .
23. Rlike vector concatenation operator.
24. Incore functional assignments e.g.: mxA :={ (x) => x * x}.
25. Straighten out behavior of Matrix.iterator() and iterateNonEmpty().
26. New mutable transposition view for incore matrices. Incore matrix transpose view.
rewrite with mostly two goals in mind: (1) enable mutability, e.g. for (col < mxA.t) col
:= k (2) translate matrix structural flavor for optimizers correctly. e.g. new SparseRowMatrix.t
carries on as columnmajor structure.
27. Native support for kryo serialization of tensor types.
28. Deprecation of MultiLayerPerceptron, ConcatenateVectorsJob and all related classes.
29. Deprecation of SparseColumnMatrix.
30. Fixes for a major memory usage bug in cooccurrence analysis used by the driver sparkitemsimilarity.
This will now require far less memory in the executor.
31. Some minor fixes to MahoutSamsara QR Decomposition and matrix ops.
32. Trim down packages size to < 200MB.
Note: Mahout 0.11.0 artifacts seem to be binary compatible with Spark 1.4.
STATS
A total of 48 separate JIRA issues are addressed in this release [2] with 7 bugfixes.
GETTING STARTED
Download the release artifacts and signatures at http://www.apache.org/dist/mahout/0.11.0/
The examples directory contains several working examples of the core functionality available
in Mahout. These can be run via scripts in the examples/bin directory. Most examples do not
need a Hadoop cluster in order to run.
FUTURE PLANS
Integration with Apache Flink is in the works in collaboration with TU Berlin and Data Artisans
to add Flink as the 3rd execution engine to Mahout. This would be in addition to existing
Apache Spark and H2O engines.
To see progress on this branch look here: https://github.com/apache/mahout/commits/master.
KNOWN ISSUES
In the nonsource zip or tar, the example data for mahout/examples/bin/runitemsim is missing.
To run it get the csv files from Github<https://github.com/apache/mahout/tree/mahout0.10.x/examples/src/main/resources>[4].
CREDITS
As with any release, we wish to thank all of the users and contributors to Mahout. Please
see the CHANGELOG [1] and JIRA Release Notes [2] for individual credits, as there are too
many to list here.
[1] https://github.com/apache/mahout/blob/master/CHANGELOG
[2] https://issues.apache.org/jira/browse/MAHOUT1757?jql=project%20%3D%20MAHOUT%20AND%20status%20in%20%28Resolved%2C%20closed%29%20AND%20%28fixVersion%20%3D%200.10.1%20OR%20fixVersion%20%3D%200.10.2%20OR%20fixVersion%20%3D%200.11.0%29
[3] http://mahout.apache.org/developers/howtocontribute.html
[4] https://github.com/apache/mahout/tree/master/examples/src/main/resources
