crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Micah Whitacre (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CRUNCH-361) Adjust the planner to handle non-existent SourceTargets
Date Thu, 27 Feb 2014 23:44:19 GMT

     [ https://issues.apache.org/jira/browse/CRUNCH-361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Micah Whitacre updated CRUNCH-361:
----------------------------------

    Summary: Adjust the planner to handle non-existent SourceTargets  (was: Illegal State
Exception)

> Adjust the planner to handle non-existent SourceTargets
> -------------------------------------------------------
>
>                 Key: CRUNCH-361
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-361
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.9.0, 0.8.2
>            Reporter: Jinal Shah
>            Assignee: Josh Wills
>            Priority: Minor
>
> So apparently  I was trying to use the ParallelDoOption in order to tell the planner
to do something in a certain way. So when you pass the sourceTarget to it and do the union
or co-group in the steps following that on the PCollection that was generated it tries to
find the size of the parent source which is still not generated. Here are the steps to produce
it
> {code}
> PCollection<U>  collection = afterSomeOperation();
> SourceTarget<U> marker = new SourceTarget<U>(pathThatDoesNotExist); // this
could be any SourceTarget implementation
> pipeline.write(collection, marker);
> PCollection<U> collection2 = pipeline.read(marker);
> PCollection<V> collection3 = collection2.parallelDo(DoFn,PType,ParallelDoOptions.builder().sources(marker).build());
> doSomeMoreOperation();
> PCollection<V> union = collection3.union(SomePCollectionOfV);
> {code}
> This will throw the exception since the union will not be able to find the size of the
marker since it is not generated yet. So the planner should know that the Source is not generated
yet and there is a job in the pipeline that will generate it.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message