Return-Path: X-Original-To: apmail-crunch-dev-archive@www.apache.org Delivered-To: apmail-crunch-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9457B10544 for ; Sat, 8 Jun 2013 07:02:33 +0000 (UTC) Received: (qmail 42761 invoked by uid 500); 8 Jun 2013 07:02:33 -0000 Delivered-To: apmail-crunch-dev-archive@crunch.apache.org Received: (qmail 42648 invoked by uid 500); 8 Jun 2013 07:02:26 -0000 Mailing-List: contact dev-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@crunch.apache.org Delivered-To: mailing list dev@crunch.apache.org Received: (qmail 41710 invoked by uid 500); 8 Jun 2013 07:02:22 -0000 Delivered-To: apmail-incubator-crunch-dev@incubator.apache.org Received: (qmail 41695 invoked by uid 99); 8 Jun 2013 07:02:21 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 08 Jun 2013 07:02:21 +0000 Date: Sat, 8 Jun 2013 07:02:21 +0000 (UTC) From: "Gabriel Reid (JIRA)" To: crunch-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CRUNCH-213) Add sharded join functionality MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CRUNCH-213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13678696#comment-13678696 ] Gabriel Reid commented on CRUNCH-213: ------------------------------------- I've run the integration tests with both hadoop-1 (i.e. default) and hadoop-2, and everything passes. Looking at the join tests themselves, it appears that there's nothing order-sensitive there. > Add sharded join functionality > ------------------------------ > > Key: CRUNCH-213 > URL: https://issues.apache.org/jira/browse/CRUNCH-213 > Project: Crunch > Issue Type: New Feature > Reporter: Gabriel Reid > Assignee: Gabriel Reid > Attachments: CRUNCH-213.patch > > > Performing joins where a large proportion of the values on one or both sides of the join are mapped to a single key can result in poor performance, as one (or a small number) of reducers end up handling most of the joining work, leaving the rest of the cluster idle. > Sharded joining should be added to allow splitting up join keys, thereby distributing values mapped to a single key over multiple reducer partitions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira