Return-Path: X-Original-To: apmail-infrastructure-dev-archive@minotaur.apache.org Delivered-To: apmail-infrastructure-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2FE40104DE for ; Tue, 25 Feb 2014 02:42:55 +0000 (UTC) Received: (qmail 71766 invoked by uid 500); 25 Feb 2014 02:42:48 -0000 Delivered-To: apmail-infrastructure-dev-archive@apache.org Received: (qmail 71634 invoked by uid 500); 25 Feb 2014 02:42:45 -0000 Mailing-List: contact infrastructure-dev-help@apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: infrastructure-dev@apache.org Delivered-To: mailing list infrastructure-dev@apache.org Received: (qmail 71626 invoked by uid 99); 25 Feb 2014 02:42:43 -0000 Received: from tyr.zones.apache.org (HELO tyr.zones.apache.org) (140.211.11.114) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Feb 2014 02:42:43 +0000 Received: by tyr.zones.apache.org (Postfix, from userid 65534) id 56CBA925583; Tue, 25 Feb 2014 02:42:43 +0000 (UTC) From: mengxr To: infrastructure-dev@apache.org Reply-To: infrastructure-dev@apache.org References: In-Reply-To: Subject: [GitHub] incubator-spark pull request: SPARK-1122: allCollect functions for... Content-Type: text/plain Message-Id: <20140225024243.56CBA925583@tyr.zones.apache.org> Date: Tue, 25 Feb 2014 02:42:43 +0000 (UTC) Github user mengxr commented on the pull request: https://github.com/apache/incubator-spark/pull/635#issuecomment-35969142 @JoshRosen I didn't implement this ... AllCollect is better than directly using broadcast variables if it is implemented without putting heavy load on the driver and data is not very small. But the current implementation is no better than directly broadcasting variables. A slightly better solution would be shuffle-based, which does not put load on the driver but it might create duplicate blocks at the same physical node. The efficient broadcasting you described sounds interesting, but yes I can imagine it is difficult to implement. Thanks for sharing your thoughts! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. ---