hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: hbase cannot normally start regionserver in the environment of big data.
Date Fri, 07 Nov 2014 12:57:26 GMT
Hi,

Have you checked that your Hadoop is running fine? Have you checked that
network between your servers is fine to?

JM

2014-11-07 5:22 GMT-05:00 hankedang@sina.cn <hankedang@sina.cn>:

>      I've deploied a "2+4" cluster which has been normally running for a
> long time.
> The cluster has got more than 40T data.When I initiatively shut the hbase
> service
> and try to restart it,the regionserver will be dead.
>
>     The log of regionserver shows that all the regions are opened. But in
> the logs of the datanode can see WARN and ERROR logs.
>     Bellow is the log for details:
>
>     2014-11-07 14:47:21,584 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 10.230.63.12:50010, dest: /10.230.63.9:39405, bytes: 4696, op: HDFS_READ,
> cliID:                     DFSClient_hb_rs_salve1,60020,1415342303886_-
> 2037622978_29, offset: 31996928, srvID:
> bb0032a3-1170-4a34-b85b-e2cfa0d56cb2, blockid: BP-1731746090-10.230.63.3-
>   1406195669990:blk_1078709392_4968828, duration: 7978822
>     2014-11-07 14:47:21,596 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: exception:
>     java.net.SocketTimeoutException: 480000 millis timeout while waiting
> for channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/10.230.63.12:50010
>  remote=/10.230.63.11:41511]
>     at
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
>     at
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
>     at
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
>     at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547)
>     at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:712)
>     at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:479)
>     at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110)
>     at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
>     at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:229)
>     at java.lang.Thread.run(Thread.java:744)
> 2014-11-07 14:47:21,599 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 10.230.63.12:50010, dest: /10.230.63.11:41511, bytes: 726528, op:
> HDFS_READ, cliID: DFSClient_hb_rs_salve3,60020,1415342303807_1094119849_29,
> offset: 0, srvID: bb0032a3-1170-4a34-b85b-e2cfa0d56cb2, blockid:
> BP-1731746090-10.230.63.3-1406195669990:blk_1078034913_4294168, duration:
> 480190668115
> 2014-11-07 14:47:21,599 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(10.230.63.12,
> datanodeUuid=bb0032a3-1170-4a34-b85b-e2cfa0d56cb2, infoPort=50075,
> ipcPort=50020, storageInfo=lv=-55;cid=cluster12;nsid=395652542;c=0):Got
> exception while serving
> BP-1731746090-10.230.63.3-1406195669990:blk_1078034913_4294168 to /
> 10.230.63.11:41511
> java.net.SocketTimeoutException: 480000 millis timeout while waiting for
> channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/10.230.63.12:50010
> remote=/10.230.63.11:41511]
> at
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
> at
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
> at
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:712)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:479)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:229)
> at java.lang.Thread.run(Thread.java:744)
> 2014-11-07 14:47:21,600 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: salve4:50010:DataXceiver
> error processing READ_BLOCK operation src: /10.230.63.11:41511 dest: /
> 10.230.63.12:50010
>
>
>     I personally think it was caused on the load on open stage,where the
> disk IO of the cluster can
> be very high and the pressure can be huge.
>
>     I wonder what results in reading error while reading hfile,and what
> leads to timeout.
> Are there any solutions that can control the speed of loading on open and
> reduce
> pressure of the cluster?
>
> I need help !
>
> Thanks!
>
>
>
>
> hankedang@sina.cn
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message