pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <ga...@yahoo-inc.com>
Subject Re: error with pig job
Date Thu, 06 Dec 2007 17:05:57 GMT
Utkarsh,

I can submit a patch for this today.  Do you know of a simple test case 
that reproduces the error?

Alan.



Utkarsh Srivastava wrote:
> Alan, this is a problem with the combiner part (the problem of putting 
> an indexed tuple directly into the bag, the first point in my comment 
> about the combiner patch that was committed). Some of the mappers that 
> spill their bags to disk, have a problem reading them back, because 
> what was written out was an indexed tuple, while what is expected to 
> be read is a regular Tuple.
>
>
> Utkarsh
>
>
>
>
>
>
> On Dec 5, 2007, at 3:50 PM, Andrew Hitchcock wrote:
>
>> Hi folks,
>>
>> I'm having a problem with a Pig job I wrote, it is throwing exceptions
>> in the map phase. I'm using the latest SVN of Pig, compiled against
>> the Hadoop15 jar included in SVN. My cluster is running Hadoop 0.15.1
>> on Java 1.6.0_03. Here's the pig job (which I ran through grunt):
>>
>> A = LOAD 'netflix/netflix.csv' USING PigStorage(',') AS
>> (movie,user,rating,date);
>> B = GROUP A BY movie;
>> C = FOREACH B GENERATE group, COUNT(A.user) as ratingcount,
>> AVG(A.rating) as averagerating;
>> D = ORDER C BY averagerating;
>> STORE D INTO 'output/output.tsv';
>>
>> A large number of jobs fail (but not all, some succeed)  with the
>> following exception:
>>
>> error: Error message from task (map) tip_200712051644_0002_m_000003
>> java.lang.RuntimeException: Unexpected data while reading tuple from
>> binary file
>>     at 
>> org.apache.pig.impl.io.DataBagFileReader$myIterator.next(DataBagFileReader.java:81)

>>
>>     at 
>> org.apache.pig.impl.io.DataBagFileReader$myIterator.next(DataBagFileReader.java:41)

>>
>>     at 
>> org.apache.pig.impl.eval.collector.DataCollector.addToSuccessor(DataCollector.java:89)

>>
>>     at 
>> org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:35)
>>     at 
>> org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.exec(GenerateSpec.java:273)

>>
>>     at org.apache.pig.impl.eval.GenerateSpec$1.add(GenerateSpec.java:86)
>>     at org.apache.pig.impl.eval.EvalSpec.simpleEval(EvalSpec.java:216)
>>     at 
>> org.apache.pig.impl.eval.FuncEvalSpec$1.add(FuncEvalSpec.java:105)
>>     at 
>> org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.<init>(GenerateSpec.java:165)

>>
>>     at org.apache.pig.impl.eval.GenerateSpec$1.add(GenerateSpec.java:77)
>>     at 
>> org.apache.pig.impl.mapreduceExec.PigCombine.reduce(PigCombine.java:101)
>>     at 
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:439)

>>
>>     at 
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpillToDisk(MapTask.java:418)

>>
>>     at 
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:364) 
>>
>>     at 
>> org.apache.pig.impl.mapreduceExec.PigMapReduce$MapDataOutputCollector.add(PigMapReduce.java:309)

>>
>>     at 
>> org.apache.pig.impl.eval.collector.UnflattenCollector.add(UnflattenCollector.java:56)

>>
>>     at 
>> org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.add(GenerateSpec.java:242)

>>
>>     at 
>> org.apache.pig.impl.eval.collector.UnflattenCollector.add(UnflattenCollector.java:56)

>>
>>     at 
>> org.apache.pig.impl.eval.collector.DataCollector.addToSuccessor(DataCollector.java:93)

>>
>>     at 
>> org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:35)
>>     at 
>> org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.exec(GenerateSpec.java:273)

>>
>>     at org.apache.pig.impl.eval.GenerateSpec$1.add(GenerateSpec.java:86)
>>     at 
>> org.apache.pig.impl.eval.collector.UnflattenCollector.add(UnflattenCollector.java:56)

>>
>>     at 
>> org.apache.pig.impl.mapreduceExec.PigMapReduce.run(PigMapReduce.java:113) 
>>
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
>>     at 
>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)
>>
>> As a comparison, the following job runs successfully:
>>
>> A = LOAD 'netflix/netflix.csv' USING PigStorage(',') AS
>> (movie,user,rating,date);
>> B = FILTER A BY movie == '8';
>> C = GROUP B BY movie;
>> D = FOREACH C GENERATE group, COUNT(B.user) as ratingcount,
>> AVG(B.rating) as averagerating;
>> DUMP D;
>>
>> Any help in tracking this down would be greatly appreciated. So far,
>> Pig is looking really slick and I'd love to write more advanced
>> programs with it.
>>
>> Thanks,
>> Andrew Hitchcock
>

Mime
View raw message