spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sro...@apache.org
Subject spark git commit: [SPARK-19759][ML] not using blas in ALSModel.predict for optimization
Date Sat, 11 Nov 2017 10:10:57 GMT
Repository: spark
Updated Branches:
  refs/heads/master 808e886b9 -> 3eb315d71


[SPARK-19759][ML] not using blas in ALSModel.predict for optimization

## What changes were proposed in this pull request?

In `ALS.predict` currently we are using `blas.sdot` function to perform a dot product on two
`Seq`s. It turns out that this is not the most efficient way.

I used the following code to compare the implementations:

```
def time[R](block: => R): Unit = {
    val t0 = System.nanoTime()
    block
    val t1 = System.nanoTime()
    println("Elapsed time: " + (t1 - t0) + "ns")
}
val r = new scala.util.Random(100)
val input = (1 to 500000).map(_ => (1 to 100).map(_ => r.nextFloat).toSeq)
def f(a:Seq[Float], b:Seq[Float]): Float = {
    var r = 0.0f
    for(i <- 0 until a.length) {
        r+=a(i)*b(i)
    }
    r
}
import com.github.fommil.netlib.BLAS.{getInstance => blas}
val b = (1 to 100).map(_ => r.nextFloat).toSeq
time { input.foreach(a=>blas.sdot(100, a.toArray, 1, b.toArray, 1)) }
// on average it takes 2968718815 ns
time { input.foreach(a=>f(a,b)) }
// on average it takes 515510185 ns
```

Thus this PR proposes the old-style for loop implementation for performance reasons.

## How was this patch tested?

existing UTs

Author: Marco Gaido <mgaido@hortonworks.com>

Closes #19685 from mgaido91/SPARK-19759.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3eb315d7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3eb315d7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3eb315d7

Branch: refs/heads/master
Commit: 3eb315d7141d69ac040dcba498dd863b6d217775
Parents: 808e886
Author: Marco Gaido <mgaido@hortonworks.com>
Authored: Sat Nov 11 04:10:54 2017 -0600
Committer: Sean Owen <sowen@cloudera.com>
Committed: Sat Nov 11 04:10:54 2017 -0600

----------------------------------------------------------------------
 .../scala/org/apache/spark/ml/recommendation/ALS.scala    | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/3eb315d7/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
----------------------------------------------------------------------
diff --git a/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala b/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
index a884366..81a8f50 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
@@ -289,9 +289,13 @@ class ALSModel private[ml] (
 
   private val predict = udf { (featuresA: Seq[Float], featuresB: Seq[Float]) =>
     if (featuresA != null && featuresB != null) {
-      // TODO(SPARK-19759): try dot-producting on Seqs or another non-converted type for
-      // potential optimization.
-      blas.sdot(rank, featuresA.toArray, 1, featuresB.toArray, 1)
+      var dotProduct = 0.0f
+      var i = 0
+      while (i < rank) {
+        dotProduct += featuresA(i) * featuresB(i)
+        i += 1
+      }
+      dotProduct
     } else {
       Float.NaN
     }


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message