Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id F2196200D2B for ; Thu, 2 Nov 2017 12:42:44 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id F0C16160BE5; Thu, 2 Nov 2017 11:42:44 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6A2491609EE for ; Thu, 2 Nov 2017 12:42:44 +0100 (CET) Received: (qmail 75689 invoked by uid 500); 2 Nov 2017 11:42:43 -0000 Mailing-List: contact reviews-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@spark.apache.org Received: (qmail 75660 invoked by uid 99); 2 Nov 2017 11:42:43 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Nov 2017 11:42:43 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 4878FDFB0E; Thu, 2 Nov 2017 11:42:43 +0000 (UTC) From: akopich To: reviews@spark.apache.org Reply-To: reviews@spark.apache.org References: In-Reply-To: Subject: [GitHub] spark pull request #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should fi... Content-Type: text/plain Message-Id: <20171102114243.4878FDFB0E@git1-us-west.apache.org> Date: Thu, 2 Nov 2017 11:42:43 +0000 (UTC) archived-at: Thu, 02 Nov 2017 11:42:45 -0000 Github user akopich commented on a diff in the pull request: https://github.com/apache/spark/pull/19565#discussion_r148507781 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -497,40 +481,46 @@ final class OnlineLDAOptimizer extends LDAOptimizer with Logging { (u._1, u._2, u._3 + v._3) } - val (statsSum: BDM[Double], logphatOption: Option[BDV[Double]], nonEmptyDocsN: Long) = stats - .treeAggregate((BDM.zeros[Double](k, vocabSize), logphatPartOptionBase(), 0L))( - elementWiseSum, elementWiseSum - ) + val (statsSum: BDM[Double], logphatOption: Option[BDV[Double]], batchSize: Long) = + batch.treeAggregate((BDM.zeros[Double](k, vocabSize), logphatPartOptionBase(), 0L))({ + case (acc, (_, termCounts)) => + val stat = BDM.zeros[Double](k, vocabSize) --- End diff -- Actually, we can fix this w/o falling back to `mapPartition`. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org For additional commands, e-mail: reviews-help@spark.apache.org