spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sro...@apache.org
Subject spark git commit: [MLLIB] org.apache.spark.mllib.util.SVMDataGenerator generates ArrayIndexOutOfBoundsException. I have found the bug and tested the solution.
Date Sat, 25 Jun 2016 08:11:30 GMT
Repository: spark
Updated Branches:
  refs/heads/master a7d29499d -> a3c7b4187


[MLLIB] org.apache.spark.mllib.util.SVMDataGenerator generates ArrayIndexOutOfBoundsException.
I have found the bug and tested the solution.

## What changes were proposed in this pull request?

Just adjust the size of an array in line 58 so it does not cause an ArrayOutOfBoundsException
in line 66.

## How was this patch tested?

Manual tests. I have recompiled the entire project with the fix, it has been built successfully
and I have run the code, also with good results.

line 66: val yD = blas.ddot(trueWeights.length, x, 1, trueWeights, 1) + rnd.nextGaussian()
* 0.1
crashes because trueWeights has length "nfeatures + 1" while "x" has length "features", and
they should have the same length.

To fix this just make trueWeights be the same length as x.

I have recompiled the project with the change and it is working now:
[spark-1.6.1]$ spark-submit --master local[*] --class org.apache.spark.mllib.util.SVMDataGenerator
mllib/target/spark-mllib_2.11-1.6.1.jar local /home/user/test

And it generates the data successfully now in the specified folder.

Author: José Antonio <joseanmunoz@gmail.com>

Closes #13895 from j4munoz/patch-2.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a3c7b418
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a3c7b418
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a3c7b418

Branch: refs/heads/master
Commit: a3c7b4187bad00dad87df7e3b5929a44d29568ed
Parents: a7d2949
Author: José Antonio <joseanmunoz@gmail.com>
Authored: Sat Jun 25 09:11:25 2016 +0100
Committer: Sean Owen <sowen@cloudera.com>
Committed: Sat Jun 25 09:11:25 2016 +0100

----------------------------------------------------------------------
 .../main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/a3c7b418/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala
----------------------------------------------------------------------
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala b/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala
index cde5979..c946860 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/util/SVMDataGenerator.scala
@@ -55,7 +55,7 @@ object SVMDataGenerator {
     val sc = new SparkContext(sparkMaster, "SVMGenerator")
 
     val globalRnd = new Random(94720)
-    val trueWeights = Array.fill[Double](nfeatures + 1)(globalRnd.nextGaussian())
+    val trueWeights = Array.fill[Double](nfeatures)(globalRnd.nextGaussian())
 
     val data: RDD[LabeledPoint] = sc.parallelize(0 until nexamples, parts).map { idx =>
       val rnd = new Random(42 + idx)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message