hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dennis Kubes <nutch-...@dragonflymc.com>
Subject Re: Many Checksum Errors
Date Wed, 02 May 2007 04:55:29 GMT
I can read files through fs cat.  Also the errors once rescheduled will 
most often fix themselves, although some times enough of them occur 
where a single job will fail.

Dennis Kubes

Raghu Angadi wrote:
> 
> Can you manually try to read one such file with 'hadoop fs -cat'? If it 
> is not a transient software error, you should see the checksum error 
> again. If you see the error, it does not confirm a hardware error but if 
> you are able to read correctly, then it is mostly Hadoop bug.
> 
> Raghu.
> 
> Dennis Kubes wrote:
>> All,
>>
>> We are continually experiencing checksum errors when running some jobs 
>> under heavy load (specifically merging segments or crawldbs).  I am 
>> lost as to whether this is a hardware or software problem.  Two 
>> questions, one is anyone else experiencing a large number of checksum 
>> type errors on big clusters?  Two, does anyone know if this is 
>> hardware or software related?  Here are some examples.
>>
>> Dennis Kubes
>>
>>
>> org.apache.hadoop.fs.ChecksumException: Checksum error: 
>> /d01/hadoop/mapred/local/task_0042_m_001905_0/spill0.out at 79597056
>>     at 
>> org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.verifySum(ChecksumFileSystem.java:258)

>>
>>     at 
>> org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.readBuffer(ChecksumFileSystem.java:211)

>>
>>     at 
>> org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.read(ChecksumFileSystem.java:167)

>>
>>     at 
>> org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:41)

>>
>>     at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>>     at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
>>     at java.io.BufferedInputStream.read(BufferedInputStream.java:313)
>>     at java.io.DataInputStream.readFully(DataInputStream.java:176)
>>     at java.io.DataInputStream.readFully(DataInputStream.java:152)
>>     at 
>> org.apache.hadoop.io.SequenceFile$UncompressedBytes.reset(SequenceFile.java:427)

>>
>>     at 
>> org.apache.hadoop.io.SequenceFile$UncompressedBytes.access$700(SequenceFile.java:414)

>>
>>     at 
>> org.apache.hadoop.io.SequenceFile$Reader.nextRawValue(SequenceFile.java:1669) 
>>
>>     at 
>> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawValue(SequenceFile.java:2585)

>>
>>     at 
>> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.next(SequenceFile.java:2356)

>>
>>     at 
>> org.apache.hadoop.io.SequenceFile$Sorter.writeFile(SequenceFile.java:2230) 
>>
>>     at 
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:517) 
>>
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:191)
>>     at 
>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1701)
>>
>>
>>
>> Map output lost, rescheduling: getMapOutput(task_0042_m_000375_0,4) 
>> failed :
>> org.apache.hadoop.fs.ChecksumException: Checksum error: 
>> /d01/hadoop/mapred/local/task_0042_m_000375_0/file.out at 20267008
>>     at 
>> org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.verifySum(ChecksumFileSystem.java:258)

>>
>>     at 
>> org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.readBuffer(ChecksumFileSystem.java:211)

>>
>>     at 
>> org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.read(ChecksumFileSystem.java:167)

>>
>>     at 
>> org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:41)

>>
>>     at java.io.BufferedInputStream.read1(BufferedInputStream.java:254)
>>     at java.io.BufferedInputStream.read(BufferedInputStream.java:313)
>>     at java.io.DataInputStream.read(DataInputStream.java:134)
>>     at 
>> org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:1932)

>>
>>     at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
>>     at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
>>     at 
>> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
>>     at 
>> org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)

>>
>>     at 
>> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
>>     at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
>>     at 
>> org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)

>>
>>     at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
>>     at org.mortbay.http.HttpServer.service(HttpServer.java:954)
>>     at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
>>     at 
>> org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
>>     at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
>>     at 
>> org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
>>     at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
>>     at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
>>
> 

Mime
View raw message