Return-Path: X-Original-To: apmail-crunch-dev-archive@www.apache.org Delivered-To: apmail-crunch-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2451610F01 for ; Mon, 19 Aug 2013 16:10:55 +0000 (UTC) Received: (qmail 30052 invoked by uid 500); 19 Aug 2013 16:10:54 -0000 Delivered-To: apmail-crunch-dev-archive@crunch.apache.org Received: (qmail 29972 invoked by uid 500); 19 Aug 2013 16:10:53 -0000 Mailing-List: contact dev-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@crunch.apache.org Delivered-To: mailing list dev@crunch.apache.org Received: (qmail 29217 invoked by uid 500); 19 Aug 2013 16:10:51 -0000 Delivered-To: apmail-incubator-crunch-dev@incubator.apache.org Received: (qmail 29159 invoked by uid 99); 19 Aug 2013 16:10:51 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Aug 2013 16:10:51 +0000 Date: Mon, 19 Aug 2013 16:10:50 +0000 (UTC) From: "Gabriel Reid (JIRA)" To: crunch-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CRUNCH-216) Transpose arguments in MapsideJoinStrategy.join MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CRUNCH-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13743949#comment-13743949 ] Gabriel Reid commented on CRUNCH-216: ------------------------------------- I have the feeling that it's better to stay away from trying to be too clever with that stuff. I find that even when I remember to implement a decent scaleFactor method, it's still pretty hit and miss with getting reliable sizes from the getSize method (i.e. it's just really hard to do it correctly). On the other hand, usually when you're using a MapSideJoin there is going to be a really big difference in the size of collections being joined, so maybe it would be ok even if the size heuristic isn't that reliable. > Transpose arguments in MapsideJoinStrategy.join > ----------------------------------------------- > > Key: CRUNCH-216 > URL: https://issues.apache.org/jira/browse/CRUNCH-216 > Project: Crunch > Issue Type: Improvement > Reporter: Gabriel Reid > > The MapsideJoinStrategy currently specifies that the smaller table in the join (i.e. the table to be replicated and loaded in memory) should be on the right-hand side of the join. > This is the opposite of what is done in all other join strategies, making it impossible to just switch out another join strategy for a MapsideJoinStrategy. The MapsideJoinStrategy could be brought in line with the other JoinStrategies to expect the smaller of two tables to be provided as the left-side table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira