spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jkbrad...@apache.org
Subject spark git commit: [SPARK-22332][ML][TEST] Fix NaiveBayes unit test occasionly fail (cause by test dataset not deterministic)
Date Wed, 25 Oct 2017 21:31:53 GMT
Repository: spark
Updated Branches:
  refs/heads/branch-2.2 9ed64048a -> 35725f735


[SPARK-22332][ML][TEST] Fix NaiveBayes unit test occasionly fail (cause by test dataset not
deterministic)

## What changes were proposed in this pull request?

Fix NaiveBayes unit test occasionly fail:
Set seed for `BrzMultinomial.sample`, make `generateNaiveBayesInput` output deterministic
dataset.
(If we do not set seed, the generated dataset will be random, and the model will be possible
to exceed the tolerance in the test, which trigger this failure)

## How was this patch tested?

Manually run tests multiple times and check each time output models contains the same values.

Author: WeichenXu <weichen.xu@databricks.com>

Closes #19558 from WeichenXu123/fix_nb_test_seed.

(cherry picked from commit 841f1d776f420424c20d99cf7110d06c73f9ca20)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/35725f73
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/35725f73
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/35725f73

Branch: refs/heads/branch-2.2
Commit: 35725f735019132377d81b7cf13a6a4fb92aecfe
Parents: 9ed6404
Author: WeichenXu <weichen.xu@databricks.com>
Authored: Wed Oct 25 14:31:36 2017 -0700
Committer: Joseph K. Bradley <joseph@databricks.com>
Committed: Wed Oct 25 14:31:49 2017 -0700

----------------------------------------------------------------------
 .../org/apache/spark/ml/classification/NaiveBayesSuite.scala      | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/35725f73/mllib/src/test/scala/org/apache/spark/ml/classification/NaiveBayesSuite.scala
----------------------------------------------------------------------
diff --git a/mllib/src/test/scala/org/apache/spark/ml/classification/NaiveBayesSuite.scala
b/mllib/src/test/scala/org/apache/spark/ml/classification/NaiveBayesSuite.scala
index 3a2be23..7c9cf76 100644
--- a/mllib/src/test/scala/org/apache/spark/ml/classification/NaiveBayesSuite.scala
+++ b/mllib/src/test/scala/org/apache/spark/ml/classification/NaiveBayesSuite.scala
@@ -20,7 +20,7 @@ package org.apache.spark.ml.classification
 import scala.util.Random
 
 import breeze.linalg.{DenseVector => BDV, Vector => BV}
-import breeze.stats.distributions.{Multinomial => BrzMultinomial}
+import breeze.stats.distributions.{Multinomial => BrzMultinomial, RandBasis => BrzRandBasis}
 
 import org.apache.spark.{SparkException, SparkFunSuite}
 import org.apache.spark.ml.classification.NaiveBayes.{Bernoulli, Multinomial}
@@ -329,6 +329,7 @@ object NaiveBayesSuite {
     val _pi = pi.map(math.exp)
     val _theta = theta.map(row => row.map(math.exp))
 
+    implicit val rngForBrzMultinomial = BrzRandBasis.withSeed(seed)
     for (i <- 0 until nPoints) yield {
       val y = calcLabel(rnd.nextDouble(), _pi)
       val xi = modelType match {


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message