pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Hitchcock" <adpow...@gmail.com>
Subject Re: error with pig job
Date Fri, 07 Dec 2007 00:40:50 GMT
The job gets past the point where it failed before, but it still died.
The error was an IOException, so I think it is a problem with my
cluster. I'm running the job again and I'll report back.

Thanks very much for the fast response. We are very grateful.
Andrew

On Dec 6, 2007 3:23 PM, Alan Gates <gates@yahoo-inc.com> wrote:
> Andrew,
>
> I've uploaded a patch that I think will fix your issue.  You can find it
> here:
> https://issues.apache.org/jira/secure/attachment/12371190/pig7.patch  If
> you get a chance, could you test and see if this resolves your issue?
>
> Alan.
>
>
> Utkarsh Srivastava wrote:
> > Alan, this is a problem with the combiner part (the problem of putting
> > an indexed tuple directly into the bag, the first point in my comment
> > about the combiner patch that was committed). Some of the mappers that
> > spill their bags to disk, have a problem reading them back, because
> > what was written out was an indexed tuple, while what is expected to
> > be read is a regular Tuple.
> >
> >
> > Utkarsh
> >
> >
> >
> >
> >
> >
> > On Dec 5, 2007, at 3:50 PM, Andrew Hitchcock wrote:
> >
> >> Hi folks,
> >>
> >> I'm having a problem with a Pig job I wrote, it is throwing exceptions
> >> in the map phase. I'm using the latest SVN of Pig, compiled against
> >> the Hadoop15 jar included in SVN. My cluster is running Hadoop 0.15.1
> >> on Java 1.6.0_03. Here's the pig job (which I ran through grunt):
> >>
> >> A = LOAD 'netflix/netflix.csv' USING PigStorage(',') AS
> >> (movie,user,rating,date);
> >> B = GROUP A BY movie;
> >> C = FOREACH B GENERATE group, COUNT(A.user) as ratingcount,
> >> AVG(A.rating) as averagerating;
> >> D = ORDER C BY averagerating;
> >> STORE D INTO 'output/output.tsv';
> >>
> >> A large number of jobs fail (but not all, some succeed)  with the
> >> following exception:
> >>
> >> error: Error message from task (map) tip_200712051644_0002_m_000003
> >> java.lang.RuntimeException: Unexpected data while reading tuple from
> >> binary file
> >>     at
> >> org.apache.pig.impl.io.DataBagFileReader$myIterator.next(DataBagFileReader.java:81)
> >>
> >>     at
> >> org.apache.pig.impl.io.DataBagFileReader$myIterator.next(DataBagFileReader.java:41)
> >>
> >>     at
> >> org.apache.pig.impl.eval.collector.DataCollector.addToSuccessor(DataCollector.java:89)
> >>
> >>     at
> >> org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:35)
> >>     at
> >> org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.exec(GenerateSpec.java:273)
> >>
> >>     at org.apache.pig.impl.eval.GenerateSpec$1.add(GenerateSpec.java:86)
> >>     at org.apache.pig.impl.eval.EvalSpec.simpleEval(EvalSpec.java:216)
> >>     at
> >> org.apache.pig.impl.eval.FuncEvalSpec$1.add(FuncEvalSpec.java:105)
> >>     at
> >> org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.<init>(GenerateSpec.java:165)
> >>
> >>     at org.apache.pig.impl.eval.GenerateSpec$1.add(GenerateSpec.java:77)
> >>     at
> >> org.apache.pig.impl.mapreduceExec.PigCombine.reduce(PigCombine.java:101)
> >>     at
> >> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:439)
> >>
> >>     at
> >> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpillToDisk(MapTask.java:418)
> >>
> >>     at
> >> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:364)
> >>
> >>     at
> >> org.apache.pig.impl.mapreduceExec.PigMapReduce$MapDataOutputCollector.add(PigMapReduce.java:309)
> >>
> >>     at
> >> org.apache.pig.impl.eval.collector.UnflattenCollector.add(UnflattenCollector.java:56)
> >>
> >>     at
> >> org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.add(GenerateSpec.java:242)
> >>
> >>     at
> >> org.apache.pig.impl.eval.collector.UnflattenCollector.add(UnflattenCollector.java:56)
> >>
> >>     at
> >> org.apache.pig.impl.eval.collector.DataCollector.addToSuccessor(DataCollector.java:93)
> >>
> >>     at
> >> org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:35)
> >>     at
> >> org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.exec(GenerateSpec.java:273)
> >>
> >>     at org.apache.pig.impl.eval.GenerateSpec$1.add(GenerateSpec.java:86)
> >>     at
> >> org.apache.pig.impl.eval.collector.UnflattenCollector.add(UnflattenCollector.java:56)
> >>
> >>     at
> >> org.apache.pig.impl.mapreduceExec.PigMapReduce.run(PigMapReduce.java:113)
> >>
> >>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
> >>     at
> >> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)
> >>
> >> As a comparison, the following job runs successfully:
> >>
> >> A = LOAD 'netflix/netflix.csv' USING PigStorage(',') AS
> >> (movie,user,rating,date);
> >> B = FILTER A BY movie == '8';
> >> C = GROUP B BY movie;
> >> D = FOREACH C GENERATE group, COUNT(B.user) as ratingcount,
> >> AVG(B.rating) as averagerating;
> >> DUMP D;
> >>
> >> Any help in tracking this down would be greatly appreciated. So far,
> >> Pig is looking really slick and I'd love to write more advanced
> >> programs with it.
> >>
> >> Thanks,
> >> Andrew Hitchcock
> >
>

Mime
View raw message