Return-Path: X-Original-To: apmail-incubator-crunch-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-crunch-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4F4DEDB23 for ; Sun, 16 Dec 2012 21:06:13 +0000 (UTC) Received: (qmail 33833 invoked by uid 500); 16 Dec 2012 21:06:13 -0000 Delivered-To: apmail-incubator-crunch-dev-archive@incubator.apache.org Received: (qmail 33791 invoked by uid 500); 16 Dec 2012 21:06:13 -0000 Mailing-List: contact crunch-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: crunch-dev@incubator.apache.org Delivered-To: mailing list crunch-dev@incubator.apache.org Received: (qmail 33586 invoked by uid 99); 16 Dec 2012 21:06:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 16 Dec 2012 21:06:12 +0000 Date: Sun, 16 Dec 2012 21:06:12 +0000 (UTC) From: "Josh Wills (JIRA)" To: crunch-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CRUNCH-128) Allow one stage of an MR pipeline to depend on another target being created MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CRUNCH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533517#comment-13533517 ] Josh Wills commented on CRUNCH-128: ----------------------------------- What if we created a ParallelDoOperation class in o.a.c, which could keep us down to five versions of pDo on PCollection, and we could use it for additional advanced options (like the optional SourceTargets) that we invent now and in the future? > Allow one stage of an MR pipeline to depend on another target being created > --------------------------------------------------------------------------- > > Key: CRUNCH-128 > URL: https://issues.apache.org/jira/browse/CRUNCH-128 > Project: Crunch > Issue Type: Improvement > Reporter: Josh Wills > Attachments: CheckpointingIT.java, CRUNCH-128.patch, CRUNCH-128v2.patch > > > There are a couple of problems (e.g., mapside-joins, total orderings, etc.) where we need to guarantee that one PCollection has been written to the FileSystem before another MapReduce pipeline that depends on that file is allowed to run. This doesn't fit cleanly into the current set of abstractions for Crunch, which is why we force pipelines to execute via the run command to guarantee that the files have been created before the second stage is run. > We should add the ability for a particular PCollection to require that a SourceTarget instance has been created before it can be executed, and the planner should incorporate this information into the MR pipeline planning process. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira