From reviews-return-722680-archive-asf-public=cust-asf.ponee.io@spark.apache.org Sat Oct 27 21:57:51 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 5D345180649 for ; Sat, 27 Oct 2018 21:57:51 +0200 (CEST) Received: (qmail 70854 invoked by uid 500); 27 Oct 2018 19:57:50 -0000 Mailing-List: contact reviews-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@spark.apache.org Received: (qmail 70800 invoked by uid 99); 27 Oct 2018 19:57:49 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 27 Oct 2018 19:57:49 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id C6435E05E3; Sat, 27 Oct 2018 19:57:49 +0000 (UTC) From: srowen To: reviews@spark.apache.org Reply-To: reviews@spark.apache.org References: In-Reply-To: Subject: [GitHub] spark pull request #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535... Content-Type: text/plain Message-Id: <20181027195749.C6435E05E3@git1-us-west.apache.org> Date: Sat, 27 Oct 2018 19:57:49 +0000 (UTC) Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22784#discussion_r228724515 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala --- @@ -49,7 +50,16 @@ class PCA @Since("1.4.0") (@Since("1.4.0") val k: Int) { "Try reducing the parameter k for PCA, or reduce the input feature " + "vector dimension to make this tractable.") - val mat = new RowMatrix(sources) + val mat = if (numFeatures > 65535) { + val meanVector = Statistics.colStats(sources).mean --- End diff -- Rather than call `.toArray` and `.zipped` below, can this not be written as Vector - Vector in the loop below? might be more efficient. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org For additional commands, e-mail: reviews-help@spark.apache.org