pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Hitchcock" <adpow...@gmail.com>
Subject Re: error with pig job
Date Fri, 07 Dec 2007 01:02:46 GMT
I ran the job again and it succeeded without problems. Thanks again!

On Dec 6, 2007 4:54 PM, Alan Gates <gates@yahoo-inc.com> wrote:
> I finally managed to reproduce your error on my end, and then tested it
> against my fix, which did resolve the issue.  I'll be checking in the
> fix shortly.
>
> Alan.
>
>
> Andrew Hitchcock wrote:
> > The job gets past the point where it failed before, but it still died.
> > The error was an IOException, so I think it is a problem with my
> > cluster. I'm running the job again and I'll report back.
> >
> > Thanks very much for the fast response. We are very grateful.
> > Andrew
> >
> > On Dec 6, 2007 3:23 PM, Alan Gates <gates@yahoo-inc.com> wrote:
> >
> >> Andrew,
> >>
> >> I've uploaded a patch that I think will fix your issue.  You can find it
> >> here:
> >> https://issues.apache.org/jira/secure/attachment/12371190/pig7.patch  If
> >> you get a chance, could you test and see if this resolves your issue?
> >>
> >> Alan.
> >>
> >>
> >> Utkarsh Srivastava wrote:
> >>
> >>> Alan, this is a problem with the combiner part (the problem of putting
> >>> an indexed tuple directly into the bag, the first point in my comment
> >>> about the combiner patch that was committed). Some of the mappers that
> >>> spill their bags to disk, have a problem reading them back, because
> >>> what was written out was an indexed tuple, while what is expected to
> >>> be read is a regular Tuple.
> >>>
> >>>
> >>> Utkarsh
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Dec 5, 2007, at 3:50 PM, Andrew Hitchcock wrote:
> >>>
> >>>
> >>>> Hi folks,
> >>>>
> >>>> I'm having a problem with a Pig job I wrote, it is throwing exceptions
> >>>> in the map phase. I'm using the latest SVN of Pig, compiled against
> >>>> the Hadoop15 jar included in SVN. My cluster is running Hadoop 0.15.1
> >>>> on Java 1.6.0_03. Here's the pig job (which I ran through grunt):
> >>>>
> >>>> A = LOAD 'netflix/netflix.csv' USING PigStorage(',') AS
> >>>> (movie,user,rating,date);
> >>>> B = GROUP A BY movie;
> >>>> C = FOREACH B GENERATE group, COUNT(A.user) as ratingcount,
> >>>> AVG(A.rating) as averagerating;
> >>>> D = ORDER C BY averagerating;
> >>>> STORE D INTO 'output/output.tsv';
> >>>>
> >>>> A large number of jobs fail (but not all, some succeed)  with the
> >>>> following exception:
> >>>>
> >>>> error: Error message from task (map) tip_200712051644_0002_m_000003
> >>>> java.lang.RuntimeException: Unexpected data while reading tuple from
> >>>> binary file
> >>>>     at
> >>>> org.apache.pig.impl.io.DataBagFileReader$myIterator.next(DataBagFileReader.java:81)
> >>>>
> >>>>     at
> >>>> org.apache.pig.impl.io.DataBagFileReader$myIterator.next(DataBagFileReader.java:41)
> >>>>
> >>>>     at
> >>>> org.apache.pig.impl.eval.collector.DataCollector.addToSuccessor(DataCollector.java:89)
> >>>>
> >>>>     at
> >>>> org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:35)
> >>>>     at
> >>>> org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.exec(GenerateSpec.java:273)
> >>>>
> >>>>     at org.apache.pig.impl.eval.GenerateSpec$1.add(GenerateSpec.java:86)
> >>>>     at org.apache.pig.impl.eval.EvalSpec.simpleEval(EvalSpec.java:216)
> >>>>     at
> >>>> org.apache.pig.impl.eval.FuncEvalSpec$1.add(FuncEvalSpec.java:105)
> >>>>     at
> >>>> org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.<init>(GenerateSpec.java:165)
> >>>>
> >>>>     at org.apache.pig.impl.eval.GenerateSpec$1.add(GenerateSpec.java:77)
> >>>>     at
> >>>> org.apache.pig.impl.mapreduceExec.PigCombine.reduce(PigCombine.java:101)
> >>>>     at
> >>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:439)
> >>>>
> >>>>     at
> >>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpillToDisk(MapTask.java:418)
> >>>>
> >>>>     at
> >>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:364)
> >>>>
> >>>>     at
> >>>> org.apache.pig.impl.mapreduceExec.PigMapReduce$MapDataOutputCollector.add(PigMapReduce.java:309)
> >>>>
> >>>>     at
> >>>> org.apache.pig.impl.eval.collector.UnflattenCollector.add(UnflattenCollector.java:56)
> >>>>
> >>>>     at
> >>>> org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.add(GenerateSpec.java:242)
> >>>>
> >>>>     at
> >>>> org.apache.pig.impl.eval.collector.UnflattenCollector.add(UnflattenCollector.java:56)
> >>>>
> >>>>     at
> >>>> org.apache.pig.impl.eval.collector.DataCollector.addToSuccessor(DataCollector.java:93)
> >>>>
> >>>>     at
> >>>> org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:35)
> >>>>     at
> >>>> org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.exec(GenerateSpec.java:273)
> >>>>
> >>>>     at org.apache.pig.impl.eval.GenerateSpec$1.add(GenerateSpec.java:86)
> >>>>     at
> >>>> org.apache.pig.impl.eval.collector.UnflattenCollector.add(UnflattenCollector.java:56)
> >>>>
> >>>>     at
> >>>> org.apache.pig.impl.mapreduceExec.PigMapReduce.run(PigMapReduce.java:113)
> >>>>
> >>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
> >>>>     at
> >>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)
> >>>>
> >>>> As a comparison, the following job runs successfully:
> >>>>
> >>>> A = LOAD 'netflix/netflix.csv' USING PigStorage(',') AS
> >>>> (movie,user,rating,date);
> >>>> B = FILTER A BY movie == '8';
> >>>> C = GROUP B BY movie;
> >>>> D = FOREACH C GENERATE group, COUNT(B.user) as ratingcount,
> >>>> AVG(B.rating) as averagerating;
> >>>> DUMP D;
> >>>>
> >>>> Any help in tracking this down would be greatly appreciated. So far,
> >>>> Pig is looking really slick and I'd love to write more advanced
> >>>> programs with it.
> >>>>
> >>>> Thanks,
> >>>> Andrew Hitchcock
> >>>>
>

Mime
View raw message