Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 3E321200D77 for ; Fri, 29 Dec 2017 02:32:36 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 3C8B4160C20; Fri, 29 Dec 2017 01:32:36 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 82819160C1F for ; Fri, 29 Dec 2017 02:32:35 +0100 (CET) Received: (qmail 89892 invoked by uid 500); 29 Dec 2017 01:32:34 -0000 Mailing-List: contact commits-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list commits@spark.apache.org Received: (qmail 89883 invoked by uid 99); 29 Dec 2017 01:32:34 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Dec 2017 01:32:34 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 8AF15DFD6F; Fri, 29 Dec 2017 01:32:34 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: jkbradley@apache.org To: commits@spark.apache.org Message-Id: <86a3f97903d440a1b96c09c6d8ad95a2@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: spark git commit: [SPARK-22905][MLLIB] Fix ChiSqSelectorModel save implementation Date: Fri, 29 Dec 2017 01:32:34 +0000 (UTC) archived-at: Fri, 29 Dec 2017 01:32:36 -0000 Repository: spark Updated Branches: refs/heads/master ffe6fd77a -> c74573084 [SPARK-22905][MLLIB] Fix ChiSqSelectorModel save implementation ## What changes were proposed in this pull request? Currently, in `ChiSqSelectorModel`, save: ``` spark.createDataFrame(dataArray).repartition(1).write... ``` The default partition number used by createDataFrame is "defaultParallelism", Current RoundRobinPartitioning won't guarantee the "repartition" generating the same order result with local array. We need fix it. ## How was this patch tested? N/A Author: WeichenXu Closes #20088 from WeichenXu123/fix_chisq_model_save. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c7457308 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c7457308 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c7457308 Branch: refs/heads/master Commit: c74573084e3fb5f8005433c3631a99a85e1c2c7b Parents: ffe6fd7 Author: WeichenXu Authored: Thu Dec 28 17:32:30 2017 -0800 Committer: Joseph K. Bradley Committed: Thu Dec 28 17:32:30 2017 -0800 ---------------------------------------------------------------------- .../main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/c7457308/mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---------------------------------------------------------------------- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala b/mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala index 32f1555..f923be8 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala @@ -144,7 +144,7 @@ object ChiSqSelectorModel extends Loader[ChiSqSelectorModel] { val dataArray = Array.tabulate(model.selectedFeatures.length) { i => Data(model.selectedFeatures(i)) } - spark.createDataFrame(dataArray).repartition(1).write.parquet(Loader.dataPath(path)) + spark.createDataFrame(sc.makeRDD(dataArray, 1)).write.parquet(Loader.dataPath(path)) } def load(sc: SparkContext, path: String): ChiSqSelectorModel = { --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org For additional commands, e-mail: commits-help@spark.apache.org