pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <ga...@yahoo-inc.com>
Subject Re: error with pig job
Date Fri, 07 Dec 2007 00:54:45 GMT
I finally managed to reproduce your error on my end, and then tested it 
against my fix, which did resolve the issue.  I'll be checking in the 
fix shortly.

Alan.

Andrew Hitchcock wrote:
> The job gets past the point where it failed before, but it still died.
> The error was an IOException, so I think it is a problem with my
> cluster. I'm running the job again and I'll report back.
>
> Thanks very much for the fast response. We are very grateful.
> Andrew
>
> On Dec 6, 2007 3:23 PM, Alan Gates <gates@yahoo-inc.com> wrote:
>   
>> Andrew,
>>
>> I've uploaded a patch that I think will fix your issue.  You can find it
>> here:
>> https://issues.apache.org/jira/secure/attachment/12371190/pig7.patch  If
>> you get a chance, could you test and see if this resolves your issue?
>>
>> Alan.
>>
>>
>> Utkarsh Srivastava wrote:
>>     
>>> Alan, this is a problem with the combiner part (the problem of putting
>>> an indexed tuple directly into the bag, the first point in my comment
>>> about the combiner patch that was committed). Some of the mappers that
>>> spill their bags to disk, have a problem reading them back, because
>>> what was written out was an indexed tuple, while what is expected to
>>> be read is a regular Tuple.
>>>
>>>
>>> Utkarsh
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Dec 5, 2007, at 3:50 PM, Andrew Hitchcock wrote:
>>>
>>>       
>>>> Hi folks,
>>>>
>>>> I'm having a problem with a Pig job I wrote, it is throwing exceptions
>>>> in the map phase. I'm using the latest SVN of Pig, compiled against
>>>> the Hadoop15 jar included in SVN. My cluster is running Hadoop 0.15.1
>>>> on Java 1.6.0_03. Here's the pig job (which I ran through grunt):
>>>>
>>>> A = LOAD 'netflix/netflix.csv' USING PigStorage(',') AS
>>>> (movie,user,rating,date);
>>>> B = GROUP A BY movie;
>>>> C = FOREACH B GENERATE group, COUNT(A.user) as ratingcount,
>>>> AVG(A.rating) as averagerating;
>>>> D = ORDER C BY averagerating;
>>>> STORE D INTO 'output/output.tsv';
>>>>
>>>> A large number of jobs fail (but not all, some succeed)  with the
>>>> following exception:
>>>>
>>>> error: Error message from task (map) tip_200712051644_0002_m_000003
>>>> java.lang.RuntimeException: Unexpected data while reading tuple from
>>>> binary file
>>>>     at
>>>> org.apache.pig.impl.io.DataBagFileReader$myIterator.next(DataBagFileReader.java:81)
>>>>
>>>>     at
>>>> org.apache.pig.impl.io.DataBagFileReader$myIterator.next(DataBagFileReader.java:41)
>>>>
>>>>     at
>>>> org.apache.pig.impl.eval.collector.DataCollector.addToSuccessor(DataCollector.java:89)
>>>>
>>>>     at
>>>> org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:35)
>>>>     at
>>>> org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.exec(GenerateSpec.java:273)
>>>>
>>>>     at org.apache.pig.impl.eval.GenerateSpec$1.add(GenerateSpec.java:86)
>>>>     at org.apache.pig.impl.eval.EvalSpec.simpleEval(EvalSpec.java:216)
>>>>     at
>>>> org.apache.pig.impl.eval.FuncEvalSpec$1.add(FuncEvalSpec.java:105)
>>>>     at
>>>> org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.<init>(GenerateSpec.java:165)
>>>>
>>>>     at org.apache.pig.impl.eval.GenerateSpec$1.add(GenerateSpec.java:77)
>>>>     at
>>>> org.apache.pig.impl.mapreduceExec.PigCombine.reduce(PigCombine.java:101)
>>>>     at
>>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:439)
>>>>
>>>>     at
>>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpillToDisk(MapTask.java:418)
>>>>
>>>>     at
>>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:364)
>>>>
>>>>     at
>>>> org.apache.pig.impl.mapreduceExec.PigMapReduce$MapDataOutputCollector.add(PigMapReduce.java:309)
>>>>
>>>>     at
>>>> org.apache.pig.impl.eval.collector.UnflattenCollector.add(UnflattenCollector.java:56)
>>>>
>>>>     at
>>>> org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.add(GenerateSpec.java:242)
>>>>
>>>>     at
>>>> org.apache.pig.impl.eval.collector.UnflattenCollector.add(UnflattenCollector.java:56)
>>>>
>>>>     at
>>>> org.apache.pig.impl.eval.collector.DataCollector.addToSuccessor(DataCollector.java:93)
>>>>
>>>>     at
>>>> org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:35)
>>>>     at
>>>> org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.exec(GenerateSpec.java:273)
>>>>
>>>>     at org.apache.pig.impl.eval.GenerateSpec$1.add(GenerateSpec.java:86)
>>>>     at
>>>> org.apache.pig.impl.eval.collector.UnflattenCollector.add(UnflattenCollector.java:56)
>>>>
>>>>     at
>>>> org.apache.pig.impl.mapreduceExec.PigMapReduce.run(PigMapReduce.java:113)
>>>>
>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
>>>>     at
>>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)
>>>>
>>>> As a comparison, the following job runs successfully:
>>>>
>>>> A = LOAD 'netflix/netflix.csv' USING PigStorage(',') AS
>>>> (movie,user,rating,date);
>>>> B = FILTER A BY movie == '8';
>>>> C = GROUP B BY movie;
>>>> D = FOREACH C GENERATE group, COUNT(B.user) as ratingcount,
>>>> AVG(B.rating) as averagerating;
>>>> DUMP D;
>>>>
>>>> Any help in tracking this down would be greatly appreciated. So far,
>>>> Pig is looking really slick and I'd love to write more advanced
>>>> programs with it.
>>>>
>>>> Thanks,
>>>> Andrew Hitchcock
>>>>         

Mime
View raw message