spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From r...@apache.org
Subject git commit: MLLIB-25: Implicit ALS runs out of memory for moderately large numbers of features
Date Fri, 21 Feb 2014 20:46:16 GMT
Repository: incubator-spark
Updated Branches:
  refs/heads/master 45b15e27a -> c8a4c9b1f


MLLIB-25: Implicit ALS runs out of memory for moderately large numbers of features

There's a step in implicit ALS where the matrix `Yt * Y` is computed. It's computed as the
sum of matrices; an f x f matrix is created for each of n user/item rows in a partition. In
`ALS.scala:214`:

```
        factors.flatMapValues{ case factorArray =>
          factorArray.map{ vector =>
            val x = new DoubleMatrix(vector)
            x.mmul(x.transpose())
          }
        }.reduceByKeyLocally((a, b) => a.addi(b))
         .values
         .reduce((a, b) => a.addi(b))
```

Completely correct, but there's a subtle but quite large memory problem here. map() is going
to create all of these matrices in memory at once, when they don't need to ever all exist
at the same time.
For example, if a partition has n = 100000 rows, and f = 200, then this intermediate product
requires 32GB of heap. The computation will never work unless you can cough up workers with
(more than) that much heap.

Fortunately there's a trivial change that fixes it; just add `.view` in there.

Author: Sean Owen <sowen@cloudera.com>

Closes #629 from srowen/ALSMatrixAllocationOptimization and squashes the following commits:

062cda9 [Sean Owen] Update style per review comments
e9a5d63 [Sean Owen] Avoid unnecessary out of memory situation by not simultaneously allocating
lots of matrices


Project: http://git-wip-us.apache.org/repos/asf/incubator-spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-spark/commit/c8a4c9b1
Tree: http://git-wip-us.apache.org/repos/asf/incubator-spark/tree/c8a4c9b1
Diff: http://git-wip-us.apache.org/repos/asf/incubator-spark/diff/c8a4c9b1

Branch: refs/heads/master
Commit: c8a4c9b1f6005815f5a4a331970624d1706b6b13
Parents: 45b15e2
Author: Sean Owen <sowen@cloudera.com>
Authored: Fri Feb 21 12:46:12 2014 -0800
Committer: Reynold Xin <rxin@apache.org>
Committed: Fri Feb 21 12:46:12 2014 -0800

----------------------------------------------------------------------
 .../main/scala/org/apache/spark/mllib/recommendation/ALS.scala   | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-spark/blob/c8a4c9b1/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
----------------------------------------------------------------------
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
index c668b04..8958040 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
@@ -211,8 +211,8 @@ class ALS private (var numBlocks: Int, var rank: Int, var iterations:
Int, var l
   def computeYtY(factors: RDD[(Int, Array[Array[Double]])]) = {
     if (implicitPrefs) {
       Option(
-        factors.flatMapValues{ case factorArray =>
-          factorArray.map{ vector =>
+        factors.flatMapValues { case factorArray =>
+          factorArray.view.map { vector =>
             val x = new DoubleMatrix(vector)
             x.mmul(x.transpose())
           }


Mime
View raw message