hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Haitao Yao <yao.e...@gmail.com>
Subject Socket timeout for BlockReaderLocal
Date Tue, 04 Dec 2012 07:09:28 GMT
hi, all
	I's using Hadoop 1.2.0 , java version "1.7.0_05"
	When running my pig script ,  the worker always report this error, and the MR jobs run very
slow. 
	Increase the dfs.socket.timeout value does not work. the network is ok, telnet to 50020 port
is always ok.
	here's the stacktrace: 
2012-12-04 14:29:41,323 INFO org.apache.hadoop.hdfs.DFSClient: Failed to read blk_-2337696885631113108_11054058
on local machinejava.net.SocketTimeoutException: Call to /10.130.110.80:50020 failed on socket
timeout exception: java.net.SocketTimeoutException: 10000 millis timeout while waiting for
channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.130.110.80:57689
remote=/10.130.110.80:50020]
	at org.apache.hadoop.ipc.Client.wrapException(Client.java:1140)
	at org.apache.hadoop.ipc.Client.call(Client.java:1112)
	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
	at $Proxy3.getProtocolVersion(Unknown Source)
	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:411)
	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:392)
	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:374)
	at org.apache.hadoop.hdfs.DFSClient.createClientDatanodeProtocolProxy(DFSClient.java:212)
	at org.apache.hadoop.hdfs.BlockReaderLocal$LocalDatanodeInfo.getDatanodeProxy(BlockReaderLocal.java:90)
	at org.apache.hadoop.hdfs.BlockReaderLocal$LocalDatanodeInfo.access$200(BlockReaderLocal.java:65)
	at org.apache.hadoop.hdfs.BlockReaderLocal.getBlockPathInfo(BlockReaderLocal.java:224)
	at org.apache.hadoop.hdfs.BlockReaderLocal.newBlockReader(BlockReaderLocal.java:145)
	at org.apache.hadoop.hdfs.DFSClient.getLocalBlockReader(DFSClient.java:509)
	at org.apache.hadoop.hdfs.DFSClient.access$800(DFSClient.java:78)
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:2231)
	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2384)
	at java.io.DataInputStream.read(DataInputStream.java:149)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
	at org.apache.pig.impl.io.BufferedPositionedInputStream.read(BufferedPositionedInputStream.java:52)
	at org.apache.pig.impl.io.InterRecordReader.nextKeyValue(InterRecordReader.java:86)
	at org.apache.pig.impl.io.InterStorage.getNext(InterStorage.java:77)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)
	at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
	at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.net.SocketTimeoutException: 10000 millis timeout while waiting for channel
to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.130.110.80:57689
remote=/10.130.110.80:50020]
	at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
	at java.io.FilterInputStream.read(FilterInputStream.java:133)
	at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
	at java.io.DataInputStream.readInt(DataInputStream.java:387)
	at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
	at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)

I checked the source code, the exception happens here:
      //now wait for socket to be ready.
      int count = 0;
      try {
        count = selector.select(channel, ops, timeout);  
      } catch (IOException e) { //unexpected IOException.
        closed = true;
        throw e;
      } 

      if (count == 0) {
//here!!        throw new SocketTimeoutException(timeoutExceptionString(channel,
                                                                timeout, ops));
      }

	Why the selector selected nothing? the data node is not under heavy load , gc, network are
all ok.
	
	Thanks.
	

Haitao Yao
yao.erix@gmail.com
weibo: @haitao_yao
Skype:  haitao.yao.final


Mime
View raw message