Return-Path: X-Original-To: apmail-crunch-dev-archive@www.apache.org Delivered-To: apmail-crunch-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C718610F0B for ; Tue, 15 Oct 2013 06:49:58 +0000 (UTC) Received: (qmail 34478 invoked by uid 500); 15 Oct 2013 06:49:58 -0000 Delivered-To: apmail-crunch-dev-archive@crunch.apache.org Received: (qmail 34398 invoked by uid 500); 15 Oct 2013 06:49:53 -0000 Mailing-List: contact dev-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@crunch.apache.org Delivered-To: mailing list dev@crunch.apache.org Received: (qmail 34036 invoked by uid 500); 15 Oct 2013 06:49:49 -0000 Delivered-To: apmail-incubator-crunch-dev@incubator.apache.org Received: (qmail 33979 invoked by uid 99); 15 Oct 2013 06:49:45 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Oct 2013 06:49:45 +0000 Date: Tue, 15 Oct 2013 06:49:45 +0000 (UTC) From: "Gabriel Reid (JIRA)" To: crunch-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CRUNCH-278) Improvements to MapsideJoin code MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CRUNCH-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13794939#comment-13794939 ] Gabriel Reid commented on CRUNCH-278: ------------------------------------- Yeah, I think that that could work for the more general case. Calling toBundle on a PCollection would then back up to the last call to materialize and execute everything from there on in memory, and the default case is to do nothing in memory. The only issue I see with this is that it makes the materialize() call into something that visibly mutates the state of a PCollection. Materializing a PCollection mutates state under the covers anyhow, but adding these semantics to materialize very slightly breaks the idea of immutability around PCollection. That's probably not a big enough reason to not take this approach though. > Improvements to MapsideJoin code > -------------------------------- > > Key: CRUNCH-278 > URL: https://issues.apache.org/jira/browse/CRUNCH-278 > Project: Crunch > Issue Type: Bug > Components: Core, MapReduce Patterns > Reporter: Josh Wills > Assignee: Josh Wills > Attachments: CRUNCH-278.patch > > > The fact that we have special-case code in the MapsideJoinStrategy for the in-memory and MR-based Pipeline instances has always bugged me, so I set out to eliminate the distinction between the two impls by creating a new interface, ReadableSourceBundle, that encapsulates the MR and in-memory specific logic for doing mapside joins in order to remove the special-case code in MapsideJoinStrategy and hopefully make other implementations that use our mapside-join patterns much easier to test. -- This message was sent by Atlassian JIRA (v6.1#6144)