hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "hankedang@sina.cn" <hanked...@sina.cn>
Subject Re: Re: hbase cannot normally start regionserver in the environment of big data.
Date Fri, 07 Nov 2014 15:07:12 GMT
Hi,
There is no mistake of basic configuration. 
The Cluster normal run for a long time , stored after a certain amout of data.
I  restart hbase service , this kinds of problem will appear !



hankedang@sina.cn
 
From: Jean-Marc Spaggiari
Date: 2014-11-07 22:45
To: user
CC: yuzhihong
Subject: Re: hbase cannot normally start regionserver in the environment of big data.
What are you hosts names and what is in your /etc/hosts file?
 
Can you dig, dig -X and ping all your hosts including the master?
 
Is hostname returned value mapped correctly to the IP?
 
JM
 
2014-11-07 9:37 GMT-05:00 hankedang@sina.cn <hankedang@sina.cn>:
 
> Hi,
>
>     using hbase 0.96 and hadoop 2.3
>     Master is no exception information
>
>     regionserver WARN logs:
>     2014-11-07 15:13:19,512 WARN
> org.apache.hadoop.hdfs.BlockReaderFactory: I/O error constructing remote
> block reader.
> java.net.BindException: Cannot assign requested address
> at sun.nio.ch.Net.connect0(Native Method)
> at sun.nio.ch.Net.connect(Net.java:465)
> at sun.nio.ch.Net.connect(Net.java:457)
> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:666)
> at
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
> at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2764)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:746)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:661)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:325)
> at
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:567)
> at
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:793)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:840)
> at java.io.DataInputStream.readFully(DataInputStream.java:195)
> at
> org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:418)
>
>
>
> hankedang@sina.cn
>
> From: Ted Yu
> Date: 2014-11-07 21:28
> To: user@hbase.apache.org
> CC: user
> Subject: Re: hbase cannot normally start regionserver in the environment
> of big data.
> Please pastebin log from region server around the time it became dead.
>
> What hbase / Hadoop version are you using ?
>
> Anything interesting in master log ?
>
> Thanks
>
> On Nov 7, 2014, at 4:57 AM, Jean-Marc Spaggiari <jean-marc@spaggiari.org>
> wrote:
>
> > Hi,
> >
> > Have you checked that your Hadoop is running fine? Have you checked that
> > network between your servers is fine to?
> >
> > JM
> >
> > 2014-11-07 5:22 GMT-05:00 hankedang@sina.cn <hankedang@sina.cn>:
> >
> >>     I've deploied a "2+4" cluster which has been normally running for a
> >> long time.
> >> The cluster has got more than 40T data.When I initiatively shut the
> hbase
> >> service
> >> and try to restart it,the regionserver will be dead.
> >>
> >>    The log of regionserver shows that all the regions are opened. But in
> >> the logs of the datanode can see WARN and ERROR logs.
> >>    Bellow is the log for details:
> >>
> >>    2014-11-07 14:47:21,584 INFO
> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> >> 10.230.63.12:50010, dest: /10.230.63.9:39405, bytes: 4696, op:
> HDFS_READ,
> >> cliID:                     DFSClient_hb_rs_salve1,60020,1415342303886_-
> >> 2037622978_29, offset: 31996928, srvID:
> >> bb0032a3-1170-4a34-b85b-e2cfa0d56cb2, blockid:
> BP-1731746090-10.230.63.3-
> >>  1406195669990:blk_1078709392_4968828, duration: 7978822
> >>    2014-11-07 14:47:21,596 INFO
> >> org.apache.hadoop.hdfs.server.datanode.DataNode: exception:
> >>    java.net.SocketTimeoutException: 480000 millis timeout while waiting
> >> for channel to be ready for write. ch :
> >> java.nio.channels.SocketChannel[connected local=/10.230.63.12:50010
> >> remote=/10.230.63.11:41511]
> >>    at
> >>
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
> >>    at
> >>
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
> >>    at
> >>
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
> >>    at
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547)
> >>    at
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:712)
> >>    at
> >>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:479)
> >>    at
> >>
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110)
> >>    at
> >>
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
> >>    at
> >>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:229)
> >>    at java.lang.Thread.run(Thread.java:744)
> >> 2014-11-07 14:47:21,599 INFO
> >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> >> 10.230.63.12:50010, dest: /10.230.63.11:41511, bytes: 726528, op:
> >> HDFS_READ, cliID:
> DFSClient_hb_rs_salve3,60020,1415342303807_1094119849_29,
> >> offset: 0, srvID: bb0032a3-1170-4a34-b85b-e2cfa0d56cb2, blockid:
> >> BP-1731746090-10.230.63.3-1406195669990:blk_1078034913_4294168,
> duration:
> >> 480190668115
> >> 2014-11-07 14:47:21,599 WARN
> >> org.apache.hadoop.hdfs.server.datanode.DataNode:
> >> DatanodeRegistration(10.230.63.12,
> >> datanodeUuid=bb0032a3-1170-4a34-b85b-e2cfa0d56cb2, infoPort=50075,
> >> ipcPort=50020, storageInfo=lv=-55;cid=cluster12;nsid=395652542;c=0):Got
> >> exception while serving
> >> BP-1731746090-10.230.63.3-1406195669990:blk_1078034913_4294168 to /
> >> 10.230.63.11:41511
> >> java.net.SocketTimeoutException: 480000 millis timeout while waiting for
> >> channel to be ready for write. ch :
> >> java.nio.channels.SocketChannel[connected local=/10.230.63.12:50010
> >> remote=/10.230.63.11:41511]
> >> at
> >>
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
> >> at
> >>
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
> >> at
> >>
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
> >> at
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547)
> >> at
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:712)
> >> at
> >>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:479)
> >> at
> >>
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110)
> >> at
> >>
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
> >> at
> >>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:229)
> >> at java.lang.Thread.run(Thread.java:744)
> >> 2014-11-07 14:47:21,600 ERROR
> >> org.apache.hadoop.hdfs.server.datanode.DataNode:
> salve4:50010:DataXceiver
> >> error processing READ_BLOCK operation src: /10.230.63.11:41511 dest: /
> >> 10.230.63.12:50010
> >>
> >>
> >>    I personally think it was caused on the load on open stage,where the
> >> disk IO of the cluster can
> >> be very high and the pressure can be huge.
> >>
> >>    I wonder what results in reading error while reading hfile,and what
> >> leads to timeout.
> >> Are there any solutions that can control the speed of loading on open
> and
> >> reduce
> >> pressure of the cluster?
> >>
> >> I need help !
> >>
> >> Thanks!
> >>
> >>
> >>
> >>
> >> hankedang@sina.cn
> >>
>
>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message