hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avain <avaini...@gmail.com>
Subject Re: Merging of the local FS files threw an exception: java.io.IOException: java.lang.RuntimeException: java.io.EOFException
Date Thu, 21 Oct 2010 14:25:51 GMT
  533333333444444(((((((((5(56 一

從我的 iPod 傳送厂
矼
在 2010/10/21 下午6:44 時,ErikForsberg <forsberg@opera.com> 到:
艹
> On Wed, 20 Oct 2010 19:49:09 +0200
> Erik Forsberg <forsberg@opera.com> wrote:
>
>> Hi!
>>
>> I'm running Cloudera CDH2 update 2 (hadoop-0.20 0.20.1+169.113), and
>> after the upgrade I'm getting the following error in the reducers
>> during the copy phase in one of my larger jobs:
>>
>> 2010-10-20 17:43:22,343 INFO org.apache.hadoop.mapred.ReduceTask:
>> Initiating in-memory merge with 12 segments... 2010-10-20  
>> 17:43:22,344
>> INFO org.apache.hadoop.mapred.Merger: Merging 12 sorted segments
>> 2010-10-20 17:43:22,344 INFO org.apache.hadoop.mapred.Merger: Down to
>> the last merge-pass, with 12 segments left of total size: 382660295
>> bytes 2010-10-20 17:43:22,517 WARN
>> org.apache.hadoop.mapred.ReduceTask:
>> attempt_201010201640_0001_r_000000_0 Merging of the local FS files
>> threw an exception: java.io.IOException: java.lang.RuntimeException:
>> java.io.EOFException at
>> org.apache.hadoop.io.WritableComparator.compare 
>> (WritableComparator.java:128)
>
> What does a EOFException in this code actually mean? Is it hiding some
> other error that could tell me more about what's wrong?
>
> I'm seeing quite a few of these in my datanode logs:
>
> 2010-10-21 10:21:01,149 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(10.20.11.66:50010,
> storageID=DS-71308762-10.20.11.66-50010-1269957604444, infoPort=  
> 50075,
> ipcPort=50020):Got exception while serving
> blk_1081044479123523815_4852013 to /10.20.11.88:
> java.net.SocketTimeoutException: 480000 millis timeout while waiting
> for channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/10.20.11.66:50010
> remote =/10.20.11.88:41347] at
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO 
> (SocketIOWithTimeout.java:246)
> at
> org.apache.hadoop.net.SocketOutputStream.waitForWritable 
> (SocketOutputStream.java:159)
> at
> org.apache.hadoop.net.SocketOutputStream.transferToFully 
> (SocketOutputStream.java:198)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks 
> (BlockSender.java:313)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock 
> (BlockSender.java:401)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock 
> (DataXceiver.java:180)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run 
> (DataXceiver.java:95)
> at java.lang.Thread.run(Thread.java:619)
>
> Could that be related somehow?
>
> I'm also seeing large amounts of mortbay exceptions, but MAPREDUCE-5  
> says they are harmless.
>
>> *) Running with and without compressed map output, no difference.
>> *) With -Xmx512m and -Xmx768m, no difference.
>> *) Decreasing number of mappers and reducers on all nodes to decrease
>>   overall load.
>> *) Decreasing mapred.reduce.parallel.copies from 16 to 5 (default)
>
> Also tried doubling the number of reducers to get each reducer to
> process less data, but that didn't help either :-(
>
> \EF
> -- 
> Erik Forsberg <forsberg@opera.com>
> Developer, Opera Software - http://www.opera.com/

Mime
View raw message