spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jkbrad...@apache.org
Subject spark git commit: [SPARK-22905][MLLIB] Fix ChiSqSelectorModel save implementation
Date Fri, 29 Dec 2017 01:32:34 GMT
Repository: spark
Updated Branches:
  refs/heads/master ffe6fd77a -> c74573084


[SPARK-22905][MLLIB] Fix ChiSqSelectorModel save implementation

## What changes were proposed in this pull request?

Currently, in `ChiSqSelectorModel`, save:
```
spark.createDataFrame(dataArray).repartition(1).write...
```
The default partition number used by createDataFrame is "defaultParallelism",
Current RoundRobinPartitioning won't guarantee the "repartition" generating the same order
result with local array. We need fix it.

## How was this patch tested?

N/A

Author: WeichenXu <weichen.xu@databricks.com>

Closes #20088 from WeichenXu123/fix_chisq_model_save.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c7457308
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c7457308
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c7457308

Branch: refs/heads/master
Commit: c74573084e3fb5f8005433c3631a99a85e1c2c7b
Parents: ffe6fd7
Author: WeichenXu <weichen.xu@databricks.com>
Authored: Thu Dec 28 17:32:30 2017 -0800
Committer: Joseph K. Bradley <joseph@databricks.com>
Committed: Thu Dec 28 17:32:30 2017 -0800

----------------------------------------------------------------------
 .../main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/c7457308/mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala
----------------------------------------------------------------------
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala b/mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala
index 32f1555..f923be8 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala
@@ -144,7 +144,7 @@ object ChiSqSelectorModel extends Loader[ChiSqSelectorModel] {
       val dataArray = Array.tabulate(model.selectedFeatures.length) { i =>
         Data(model.selectedFeatures(i))
       }
-      spark.createDataFrame(dataArray).repartition(1).write.parquet(Loader.dataPath(path))
+      spark.createDataFrame(sc.makeRDD(dataArray, 1)).write.parquet(Loader.dataPath(path))
     }
 
     def load(sc: SparkContext, path: String): ChiSqSelectorModel = {


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message