hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Azuryy Yu <azury...@gmail.com>
Subject Re: is my hbase cluster overloaded?
Date Tue, 22 Apr 2014 06:29:39 GMT
one big possible issue is that you have a high concurrent request on HDFS
or HBASE, then all Data nodes handlers are all busy, then more requests are
pending, then timeout, so you can try to increase
dfs.datanode.handler.count and dfs.namenode.handler.count in the
hdfs-site.xml, then restart the HDFS.

another, do you have datanode, namenode, region servers JVM options? if
they are all by default, then there is also have this issue.




On Tue, Apr 22, 2014 at 2:20 PM, Li Li <fancyerii@gmail.com> wrote:

> my cluster setup: both 6 machines are virtual machine. each machine:
> 4CPU Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz 16GB memory
> 192.168.10.48 namenode/jobtracker
> 192.168.10.47 secondary namenode
> 192.168.10.45 datanode/tasktracker
> 192.168.10.46 datanode/tasktracker
> 192.168.10.49 datanode/tasktracker
> 192.168.10.50 datanode/tasktracker
>
> hdfs logs around 20:33
> 192.168.10.48 namenode log  http://pastebin.com/rwgmPEXR
> 192.168.10.45 datanode log http://pastebin.com/HBgZ8rtV (I found this
> datanode crash first)
> 192.168.10.46 datanode log http://pastebin.com/aQ2emnUi
> 192.168.10.49 datanode log http://pastebin.com/aqsWrrL1
> 192.168.10.50 datanode log http://pastebin.com/V7C6tjpB
>
> hbase logs around 20:33
> 192.168.10.48 master log http://pastebin.com/2ZfeYA1p
> 192.168.10.45 region log http://pastebin.com/idCF2a7Y
> 192.168.10.46 region log http://pastebin.com/WEh4dA0f
> 192.168.10.49 region log http://pastebin.com/cGtpbTLz
> 192.168.10.50 region log http://pastebin.com/bD6h5T6p(very strange,
> not log at 20:33, but have log at 20:32 and 20:34)
>
> On Tue, Apr 22, 2014 at 12:25 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> > Can you post more of the data node log, around 20:33 ?
> >
> > Cheers
> >
> >
> > On Mon, Apr 21, 2014 at 8:57 PM, Li Li <fancyerii@gmail.com> wrote:
> >
> >> hadoop 1.0
> >> hbase 0.94.11
> >>
> >> datanode log from 192.168.10.45. why it shut down itself?
> >>
> >> 2014-04-21 20:33:59,309 INFO
> >> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> >> blk_-7969006819959471805_202154 received exception
> >> java.io.InterruptedIOException: Interruped while waiting for IO on
> >> channel java.nio.channels.SocketChannel[closed]. 0 millis timeout
> >> left.
> >> 2014-04-21 20:33:59,310 ERROR
> >> org.apache.hadoop.hdfs.server.datanode.DataNode:
> >> DatanodeRegistration(192.168.10.45:50010,
> >> storageID=DS-1676697306-192.168.10.45-50010-1392029190949,
> >> infoPort=50075, ipcPort=50020):DataXceiver
> >> java.io.InterruptedIOException: Interruped while waiting for IO on
> >> channel java.nio.channels.SocketChannel[closed]. 0 millis timeout
> >> left.
> >>         at
> >>
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:349)
> >>         at
> >>
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
> >>         at
> >> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> >>         at
> >> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> >>         at
> java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
> >>         at
> java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> >>         at java.io.DataInputStream.read(DataInputStream.java:149)
> >>         at
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:265)
> >>         at
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:312)
> >>         at
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376)
> >>         at
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:532)
> >>         at
> >>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:398)
> >>         at
> >>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:107)
> >>         at java.lang.Thread.run(Thread.java:722)
> >> 2014-04-21 20:33:59,310 ERROR
> >> org.apache.hadoop.hdfs.server.datanode.DataNode:
> >> DatanodeRegistration(192.168.10.45:50010,
> >> storageID=DS-1676697306-192.168.10.45-50010-1392029190949,
> >> infoPort=50075, ipcPort=50020):DataXceiver
> >> java.io.InterruptedIOException: Interruped while waiting for IO on
> >> channel java.nio.channels.SocketChannel[closed]. 466924 millis timeout
> >> left.
> >>         at
> >>
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:349)
> >>         at
> >>
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:245)
> >>         at
> >>
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
> >>         at
> >>
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
> >>         at
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:350)
> >>         at
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:436)
> >>         at
> >>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:197)
> >>         at
> >>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:99)
> >>         at java.lang.Thread.run(Thread.java:722)
> >> 2014-04-21 20:34:00,291 INFO
> >> org.apache.hadoop.hdfs.server.datanode.DataNode: Waiting for
> >> threadgroup to exit, active threads is 0
> >> 2014-04-21 20:34:00,404 INFO
> >> org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService:
> >> Shutting down all async disk service threads...
> >> 2014-04-21 20:34:00,405 INFO
> >> org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: All
> >> async disk service threads have been shut down.
> >> 2014-04-21 20:34:00,413 INFO
> >> org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
> >> 2014-04-21 20:34:00,424 INFO
> >> org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
> >> /************************************************************
> >> SHUTDOWN_MSG: Shutting down DataNode at app-hbase-1/192.168.10.45
> >> ************************************************************/
> >>
> >> On Tue, Apr 22, 2014 at 11:25 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> >> > bq. one datanode failed
> >> >
> >> > Was the crash due to out of memory error ?
> >> > Can you post the tail of data node log on pastebin ?
> >> >
> >> > Giving us versions of hadoop and hbase would be helpful.
> >> >
> >> >
> >> > On Mon, Apr 21, 2014 at 7:39 PM, Li Li <fancyerii@gmail.com> wrote:
> >> >
> >> >> I have a small hbase cluster with 1 namenode, 1 secondary namenode,
4
> >> >> datanode.
> >> >> and the hbase master is on the same machine with namenode, 4 hbase
> >> >> slave on datanode machine.
> >> >> I found average requests per seconds is about 10,000. and the
> clusters
> >> >> crashed. and I found the reason is one datanode failed.
> >> >>
> >> >> the datanode configuration is about 4 cpu core and 10GB memory
> >> >> is my cluster overloaded?
> >> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message