hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@hortonworks.com>
Subject Re: Checksum Error during Reduce Phase hadoop-1.0.2
Date Thu, 16 Aug 2012 18:34:47 GMT
Also, do you have ECC RAM?

On Aug 16, 2012, at 11:34 AM, Arun C Murthy wrote:

> Primarily, it could be caused by a corrupt disk - which is why checking if it's happening
on a specific node(s) can help.
> 
> Arun
> 
> On Aug 16, 2012, at 10:04 AM, Pavan Kulkarni wrote:
> 
>> Harsh,
>> 
>> I see this on couple of nodes.But what may be the cause of this error ?Any
>> idea about it? Thanks
>> 
>> On Sun, Aug 12, 2012 at 9:06 AM, Harsh J <harsh@cloudera.com> wrote:
>> 
>>> Hi Pavan,
>>> 
>>> Do you see this happen on a specific node every time (i.e. when the
>>> reducer runs there)?
>>> 
>>> On Fri, Aug 10, 2012 at 11:43 PM, Pavan Kulkarni
>>> <pavan.baburao@gmail.com> wrote:
>>>> Hi,
>>>> 
>>>> I am running a Terasort with a cluster of 8 nodes.The map phase
>>> completes
>>>> but when the reduce phase is around 68-70% I get this following error.
>>>> 
>>>> *
>>>> 12/08/10 11:02:36 INFO mapred.JobClient: Task Id :
>>>> attempt_201208101018_0001_r_000027_0, Status : FAILED
>>>> java.lang.RuntimeException: problem advancing post rec#38320220
>>>> *
>>>> *        at
>>>> org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1214)*
>>>> *        at
>>>> 
>>> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:249)
>>>> *
>>>> *        at
>>>> 
>>> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:245)
>>>> *
>>>> *        at
>>>> 
>>> org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:40)
>>>> *
>>>> *        at
>>>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)*
>>>> *        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)*
>>>> *        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)*
>>>> *        at java.security.AccessController.doPrivileged(Native Method)*
>>>> *        at javax.security.auth.Subject.doAs(Subject.java:416)*
>>>> *        at
>>>> 
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
>>>> *
>>>> *        at org.apache.hadoop.mapred.Child.main(Child.java:249)*
>>>> *Caused by: org.apache.hadoop.fs.ChecksumException: Checksum Error*
>>>> *        at
>>>> 
>>> org.apache.hadoop.mapred.IFileInputStream.doRead(IFileInputStream.java:164)*
>>>> *        at
>>>> 
>>> org.apache.hadoop.mapred.IFileInputStream.read(IFileInputStream.java:101)*
>>>> *        at
>>> org.apache.hadoop.mapred.IFile$Reader.readData(IFile.java:328)*
>>>> *        at
>>> org.apache.hadoop.mapred.IFile$Reader.rejigData(IFile.java:358)*
>>>> *        at
>>>> org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:342)*
>>>> *        at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:374)*
>>>> *        at
>>> org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:220)*
>>>> *        at
>>>> 
>>> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:330)
>>>> *
>>>> *        at
>>> org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
>>>> *
>>>> *        at
>>>> 
>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$RawKVIteratorReader.next(ReduceTask.java:2531)
>>>> *
>>>> *        at
>>> org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:220)*
>>>> *        at
>>>> 
>>> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:330)
>>>> *
>>>> *        at
>>> org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
>>>> *
>>>> *        at
>>>> org.apache.hadoop.mapred.Task$ValuesIterator.readNextKey(Task.java:1253)*
>>>> *        at
>>>> org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1212)*
>>>> *        ... 10 more*
>>>> 
>>>> I came across somone facing the same
>>>> issue<
>>> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201001.mbox/%3C1c802db51001280427j5b8e57dai4a8d0fdd038f41@mail.gmail.com%3E
>>>> in
>>>> the mail-archives and he seemed to resolve it by listing hostnames in
>>>> the */etc/hosts *file,
>>>> but all my nodes have correct info about the hostnames in /etc/hosts,
>>> but I
>>>> still have these reducers throwing error.
>>>> Any help regarding this issue is appreciated .Thanks
>>>> 
>>>> --
>>>> 
>>>> --With Regards
>>>> Pavan Kulkarni
>>> 
>>> 
>>> 
>>> --
>>> Harsh J
>>> 
>> 
>> 
>> 
>> -- 
>> 
>> --With Regards
>> Pavan Kulkarni
> 
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message