mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Palumbo <ap....@outlook.com>
Subject SparseRowMatrices from dense matrix operations
Date Thu, 08 Sep 2016 15:16:26 GMT
@ssc

Re: SparseRowMatrices from dense operations, there are some operations that use `SparseRowMatrix`
as the default for the accumulator in their combiners.  E.g.,

Spark ABt: https://github.com/apache/mahout/blob/master/spark/src/main/scala/org/apache/mahout/sparkbindings/blas/ABt.scala#L296


I believe that it was implemented this way so that in the worst case of over sized in-core
Sparse %*% Dense matrix multiplication if the result was too large it would not throw an OOM
error.    This is what we created the densityAnalaysis(..) method for, to detect the actual
density of a matrix on the fly and to use the appropriate structure based on the data itself.


It is actually not being used in Spark ABt yet.  There is actually a Jira open to go through
and use densityAnalysis() in all appropriate cases: https://issues.apache.org/jira/browse/MAHOUT-1873?filter=-1


So currently, ABt (and possibly some other operations) will return a `SparseRowMatrix` as
a result of 2 dense matrices (if I'm reading it correctly).


It looks like this is a good candidate for densityAnalysis().

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message