Return-Path: X-Original-To: apmail-crunch-dev-archive@www.apache.org Delivered-To: apmail-crunch-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 100B910658 for ; Fri, 7 Jun 2013 20:52:21 +0000 (UTC) Received: (qmail 27675 invoked by uid 500); 7 Jun 2013 20:52:21 -0000 Delivered-To: apmail-crunch-dev-archive@crunch.apache.org Received: (qmail 27645 invoked by uid 500); 7 Jun 2013 20:52:21 -0000 Mailing-List: contact dev-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@crunch.apache.org Delivered-To: mailing list dev@crunch.apache.org Received: (qmail 27617 invoked by uid 500); 7 Jun 2013 20:52:20 -0000 Delivered-To: apmail-incubator-crunch-dev@incubator.apache.org Received: (qmail 27599 invoked by uid 99); 7 Jun 2013 20:52:20 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Jun 2013 20:52:20 +0000 Date: Fri, 7 Jun 2013 20:52:20 +0000 (UTC) From: "Gabriel Reid (JIRA)" To: crunch-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CRUNCH-213) Add sharded join functionality MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CRUNCH-213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated CRUNCH-213: -------------------------------- Attachment: CRUNCH-213.patch Patch to introduce sharded joins. The join code is also pretty thoroughly refactored, introducing the concept of a JoinStrategy, with three implementations: DefaultJoinStrategy, MapsideJoinStrategy, and ShardedJoinStrategy. > Add sharded join functionality > ------------------------------ > > Key: CRUNCH-213 > URL: https://issues.apache.org/jira/browse/CRUNCH-213 > Project: Crunch > Issue Type: New Feature > Reporter: Gabriel Reid > Assignee: Gabriel Reid > Attachments: CRUNCH-213.patch > > > Performing joins where a large proportion of the values on one or both sides of the join are mapped to a single key can result in poor performance, as one (or a small number) of reducers end up handling most of the joining work, leaving the rest of the cluster idle. > Sharded joining should be added to allow splitting up join keys, thereby distributing values mapped to a single key over multiple reducer partitions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira