hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-573) Checksum error during sorting in reducer
Date Tue, 03 Oct 2006 17:33:21 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-573?page=comments#action_12439567 ] 
Doug Cutting commented on HADOOP-573:

Sort is actually the place where most checksum errors have been reported.  I believe this
is because sorting keeps data in memory longer than other operations, increasing the chance
that it will be corrupted there.  Does this node have ECC memory?  If so, memory errors are
unlikely.  Sorting also accounts for a large portion of the number of times data is written
to disk, so the corruption could have happened there.  It would be worth examining the syslog
on that node to see if any disk or memory errors are reported.

I assume the reduce was rescheduled and completed?  If so, then I will resolve this issue.

> Checksum error during sorting in reducer
> ----------------------------------------
>                 Key: HADOOP-573
>                 URL: http://issues.apache.org/jira/browse/HADOOP-573
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: Runping Qi
> Many reduce tasks got killed due to checksum error. The strange thing is that the file
was generated by the sort function, and was on a local disk. Here is the stack: 
> Checksum error:  ../task_0011_r_000140_0/all.2.1 at 5342920704
> 	at org.apache.hadoop.fs.FSDataInputStream$Checker.verifySum(FSDataInputStream.java:134)
> 	at org.apache.hadoop.fs.FSDataInputStream$Checker.read(FSDataInputStream.java:110)
> 	at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:170)
> 	at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> 	at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
> 	at java.io.BufferedInputStream.read(BufferedInputStream.java:313)
> 	at java.io.DataInputStream.readFully(DataInputStream.java:176)
> 	at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:55)
> 	at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:89)
> 	at org.apache.hadoop.io.SequenceFile$Reader.readBuffer(SequenceFile.java:1061)
> 	at org.apache.hadoop.io.SequenceFile$Reader.seekToCurrentValue(SequenceFile.java:1126)
> 	at org.apache.hadoop.io.SequenceFile$Reader.nextRaw(SequenceFile.java:1354)
> 	at org.apache.hadoop.io.SequenceFile$Sorter$MergeStream.next(SequenceFile.java:1880)
> 	at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:1938)
> 	at org.apache.hadoop.io.SequenceFile$Sorter$MergePass.run(SequenceFile.java:1802)
> 	at org.apache.hadoop.io.SequenceFile$Sorter.mergePass(SequenceFile.java:1749)
> 	at org.apache.hadoop.io.SequenceFile$Sorter.sort(SequenceFile.java:1494)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1066)

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message