crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Whitacre <mkw...@gmail.com>
Subject Re: Pipeline throwing No Output? exception
Date Thu, 20 Mar 2014 17:50:39 GMT
Jinal can you elaborate on the "//do something" section of the code?  I
thought when I heard it described other PCollections were being joined with
the emptyPCollection and it was the outcome of the joins and additional
processing that was actually being persisted.


On Thu, Mar 20, 2014 at 10:18 AM, Chao Shi <stepinto@live.com> wrote:

> Hi Josh and Jinal,
>
> This was introduced to help the following case: In one of our MR programs,
> there is a command line option that one can optionally specify a path to
> data to be joined on. Before introducing emptyCollection(), we have to do
> like this:
>
> Path path = ...
> PCollection in1 = null;
> if (path != null) {
>   in = pipeline.read(...);
> }
> PCollection in2 = pipeline.read(...);
> if (in1 != null) {
>   in2 = in2.join(in1);
> }
>
> You can see checks for null everywhere. With emptyPColleciton, we can do
> this:
>
> if (path != null) {
>   in2 = pipeline.read();
> } else {
>   in2 = emptyPCollection();
> }
> in1.join(in2)
>
> I think Jinal's case should be a bad case for our current implementation.
> Perhaps we should change it to create an empty output directory rather than
> report an error, which doesn't start the MR and can save the job start-up
> time. This is the benefit for knowing PCollection in plan-time.
>
>
> 2014-03-16 23:34 GMT+08:00 Josh Wills <josh.wills@gmail.com>:
>
> > +chao
> >
> > Inlined.
> >
> > On Sat, Mar 15, 2014 at 12:34 PM, Jinal Shah <jinalshah2007@gmail.com
> >wrote:
> >
> >> Hi,
> >>
> >> I actually came across a particular case I'm not sure whether the
> behavior
> >> is right or not. So here is what is happening I am getting No Output
> >> exception throwing while trying to run my Crunch job. On further
> >> investigating I found that I was using Pipeline.emptyCollection(). So
> here
> >> is how my scenario looks like
> >>
> >> PCollection<V> collectionWhichCouldBeEmpty = null;
> >> if(path.exists){
> >>     collectionWhichCouldBeEmpty= pipeline.read(FromPath, PType.V);
> >> } else{
> >>     collectionWhichCouldBeEmpty = pipeline.emptyPCollection();
> >> }
> >>
> >> //do some operations
> >>
> >> pipeline.write(Target);
> >>
> >> pipeline.run()// this is where it is throwing the error
> >>
> >>
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/impl/mr/plan/MSCRPlanner.java#L287
> >>
> >> On further debugging I found that the Vertex didn't have an input.
> >>
> >>
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/impl/mr/plan/MSCRPlanner.java#L275
> >>
> >>
> >> So If I use the pipeline.read and it creates an Empty PCollection it
> works
> >> since it has the input source but If I create an Empty PCollection using
> >> the pipeline.emptyPCollection which doesn't have an input source then it
> >> fails
> >>
> >> Not sure if the case is missed or it has to be like this.
> >>
> >
> > It's a good question, and I'm not sure of the answer. Added Chao to the
> > To: line to ask him what the intention was in this case.
> >
> >
> >> Thanks
> >> Jinal
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message