spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jkbradley <...@git.apache.org>
Subject [GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...
Date Mon, 17 Jul 2017 16:22:31 GMT
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18281#discussion_r127753960
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala
---
    @@ -101,6 +101,50 @@ class OneVsRestSuite extends SparkFunSuite with MLlibTestSparkContext
with Defau
         assert(expectedMetrics.confusionMatrix ~== ovaMetrics.confusionMatrix absTol 400)
       }
     
    +  test("one-vs-rest: tuning parallelism does not change output") {
    +    val numClasses = 3
    +    val ovaPar1 = new OneVsRest()
    +      .setClassifier(new LogisticRegression)
    +
    +    val ovaModelPar1 = ovaPar1.fit(dataset)
    +
    +    val transformedDatasetPar1 = ovaModelPar1.transform(dataset)
    +
    +    val ovaResultsPar1 = transformedDatasetPar1.select("prediction", "label").rdd.map
{
    +      row => (row.getDouble(0), row.getDouble(1))
    +    }
    +
    +    val ovaPar2 = new OneVsRest()
    +      .setClassifier(new LogisticRegression)
    +      .setParallelism(2)
    +
    +    val ovaModelPar2 = ovaPar2.fit(dataset)
    +
    +    val transformedDatasetPar2 = ovaModelPar2.transform(dataset)
    +
    +    val ovaResultsPar2 = transformedDatasetPar2.select("prediction", "label").rdd.map
{
    +      row => (row.getDouble(0), row.getDouble(1))
    +    }
    +
    +    val metricsPar1 = new MulticlassMetrics(ovaResultsPar1)
    +    val metricsPar2 = new MulticlassMetrics(ovaResultsPar2)
    +    assert(metricsPar1.confusionMatrix == metricsPar2.confusionMatrix)
    +
    +    for (i <- 0 until ovaModelPar1.models.length) {
    +      var foundCloseCoeffs = false
    +      val currentCoeffs = ovaModelPar1.models(i)
    +                                      .asInstanceOf[LogisticRegressionModel].coefficients
    +      for (j <- 0 until ovaModelPar2.models.length) {
    --- End diff --
    
    Commenting here again since the comment is now hidden: This seems like a roundabout way
to compare the models. Can you just zip the two arrays of models together and compare the
pairs?  (See response in old comment.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message