Return-Path: X-Original-To: apmail-crunch-dev-archive@www.apache.org Delivered-To: apmail-crunch-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id ED97310CC8 for ; Thu, 27 Feb 2014 23:46:25 +0000 (UTC) Received: (qmail 97373 invoked by uid 500); 27 Feb 2014 23:46:20 -0000 Delivered-To: apmail-crunch-dev-archive@crunch.apache.org Received: (qmail 97295 invoked by uid 500); 27 Feb 2014 23:46:20 -0000 Mailing-List: contact dev-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@crunch.apache.org Delivered-To: mailing list dev@crunch.apache.org Received: (qmail 97283 invoked by uid 500); 27 Feb 2014 23:46:19 -0000 Delivered-To: apmail-incubator-crunch-dev@incubator.apache.org Received: (qmail 97279 invoked by uid 99); 27 Feb 2014 23:46:19 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Feb 2014 23:46:19 +0000 Date: Thu, 27 Feb 2014 23:46:19 +0000 (UTC) From: "Micah Whitacre (JIRA)" To: crunch-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CRUNCH-361) Adjust the planner to handle non-existent SourceTargets MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CRUNCH-361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Micah Whitacre updated CRUNCH-361: ---------------------------------- Issue Type: Improvement (was: Bug) > Adjust the planner to handle non-existent SourceTargets > ------------------------------------------------------- > > Key: CRUNCH-361 > URL: https://issues.apache.org/jira/browse/CRUNCH-361 > Project: Crunch > Issue Type: Improvement > Components: Core > Affects Versions: 0.9.0, 0.8.2 > Reporter: Jinal Shah > Assignee: Josh Wills > Priority: Minor > > So apparently I was trying to use the ParallelDoOption in order to tell the planner to do something in a certain way. So when you pass the sourceTarget to it and do the union or co-group in the steps following that on the PCollection that was generated it tries to find the size of the parent source which is still not generated. Here are the steps to produce it > {code} > PCollection collection = afterSomeOperation(); > SourceTarget marker = new SourceTarget(pathThatDoesNotExist); // this could be any SourceTarget implementation > pipeline.write(collection, marker); > PCollection collection2 = pipeline.read(marker); > PCollection collection3 = collection2.parallelDo(DoFn,PType,ParallelDoOptions.builder().sources(marker).build()); > doSomeMoreOperation(); > PCollection union = collection3.union(SomePCollectionOfV); > {code} > This will throw the exception since the union will not be able to find the size of the marker since it is not generated yet. So the planner should know that the Source is not generated yet and there is a job in the pipeline that will generate it. -- This message was sent by Atlassian JIRA (v6.1.5#6160)