hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <la...@apache.org>
Subject Re: is my hbase cluster overloaded?
Date Wed, 23 Apr 2014 04:42:09 GMT
What makes you say this?

HBase has a lot of very short lived garbage (like KeyValue objects that do not outlive an
RPC request) and a lot of long lived data in the memstore and the block cache. We want to
avoid accumulating the short lived garbage and at the same time leave most heap for memstores
and blockcache.

A small eden size of 512mb or even less makes sense to me.

-- Lars



----- Original Message -----
From: Azuryy Yu <azuryyyu@gmail.com>
To: user@hbase.apache.org
Cc: 
Sent: Tuesday, April 22, 2014 12:02 AM
Subject: Re: is my hbase cluster overloaded?

Do you still have the same issue?

and:
-Xmx8000m -server -XX:NewSize=512m -XX:MaxNewSize=512m

the Eden size is too small.




On Tue, Apr 22, 2014 at 2:55 PM, Li Li <fancyerii@gmail.com> wrote:

> <property>
>   <name>dfs.datanode.handler.count</name>
>   <value>100</value>
>   <description>The number of server threads for the datanode.</description>
> </property>
>
>
> 1. namenode/master  192.168.10.48
> http://pastebin.com/7M0zzAAc
>
> $free -m (this is value when I restart the hadoop and hbase now, not
> the value when it crashed)
>              total       used       free     shared    buffers   
 cached
> Mem:         15951       3819      12131          0        509   
   1990
> -/+ buffers/cache:       1319      14631
> Swap:         8191          0       8191
>
> 2. datanode/region 192.168.10.45
> http://pastebin.com/FiAw1yju
>
> $free -m
>              total       used       free     shared    buffers   
 cached
> Mem:         15951       3627      12324          0       1516   
    641
> -/+ buffers/cache:       1469      14482
> Swap:         8191          8       8183
>
> On Tue, Apr 22, 2014 at 2:29 PM, Azuryy Yu <azuryyyu@gmail.com> wrote:
> > one big possible issue is that you have a high concurrent request on HDFS
> > or HBASE, then all Data nodes handlers are all busy, then more requests
> are
> > pending, then timeout, so you can try to increase
> > dfs.datanode.handler.count and dfs.namenode.handler.count in the
> > hdfs-site.xml, then restart the HDFS.
> >
> > another, do you have datanode, namenode, region servers JVM options? if
> > they are all by default, then there is also have this issue.
> >
> >
> >
> >
> > On Tue, Apr 22, 2014 at 2:20 PM, Li Li <fancyerii@gmail.com> wrote:
> >
> >> my cluster setup: both 6 machines are virtual machine. each machine:
> >> 4CPU Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz 16GB memory
> >> 192.168.10.48 namenode/jobtracker
> >> 192.168.10.47 secondary namenode
> >> 192.168.10.45 datanode/tasktracker
> >> 192.168.10.46 datanode/tasktracker
> >> 192.168.10.49 datanode/tasktracker
> >> 192.168.10.50 datanode/tasktracker
> >>
> >> hdfs logs around 20:33
> >> 192.168.10.48 namenode log  http://pastebin.com/rwgmPEXR
> >> 192.168.10.45 datanode log http://pastebin.com/HBgZ8rtV (I found this
> >> datanode crash first)
> >> 192.168.10.46 datanode log http://pastebin.com/aQ2emnUi
> >> 192.168.10.49 datanode log http://pastebin.com/aqsWrrL1
> >> 192.168.10.50 datanode log http://pastebin.com/V7C6tjpB
> >>
> >> hbase logs around 20:33
> >> 192.168.10.48 master log http://pastebin.com/2ZfeYA1p
> >> 192.168.10.45 region log http://pastebin.com/idCF2a7Y
> >> 192.168.10.46 region log http://pastebin.com/WEh4dA0f
> >> 192.168.10.49 region log http://pastebin.com/cGtpbTLz
> >> 192.168.10.50 region log http://pastebin.com/bD6h5T6p(very strange,
> >> not log at 20:33, but have log at 20:32 and 20:34)
> >>
> >> On Tue, Apr 22, 2014 at 12:25 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >> > Can you post more of the data node log, around 20:33 ?
> >> >
> >> > Cheers
> >> >
> >> >
> >> > On Mon, Apr 21, 2014 at 8:57 PM, Li Li <fancyerii@gmail.com> wrote:
> >> >
> >> >> hadoop 1.0
> >> >> hbase 0.94.11
> >> >>
> >> >> datanode log from 192.168.10.45. why it shut down itself?
> >> >>
> >> >> 2014-04-21 20:33:59,309 INFO
> >> >> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> >> >> blk_-7969006819959471805_202154 received exception
> >> >> java.io.InterruptedIOException: Interruped while waiting for IO on
> >> >> channel java.nio.channels.SocketChannel[closed]. 0 millis timeout
> >> >> left.
> >> >> 2014-04-21 20:33:59,310 ERROR
> >> >> org.apache.hadoop.hdfs.server.datanode.DataNode:
> >> >> DatanodeRegistration(192.168.10.45:50010,
> >> >> storageID=DS-1676697306-192.168.10.45-50010-1392029190949,
> >> >> infoPort=50075, ipcPort=50020):DataXceiver
> >> >> java.io.InterruptedIOException: Interruped while waiting for IO on
> >> >> channel java.nio.channels.SocketChannel[closed]. 0 millis timeout
> >> >> left.
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:349)
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
> >> >>         at
> >> >>
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> >> >>         at
> >> >>
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> >> >>         at
> >> java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
> >> >>         at
> >> java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> >> >>         at java.io.DataInputStream.read(DataInputStream.java:149)
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:265)
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:312)
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376)
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:532)
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:398)
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:107)
> >> >>         at java.lang.Thread.run(Thread.java:722)
> >> >> 2014-04-21 20:33:59,310 ERROR
> >> >> org.apache.hadoop.hdfs.server.datanode.DataNode:
> >> >> DatanodeRegistration(192.168.10.45:50010,
> >> >> storageID=DS-1676697306-192.168.10.45-50010-1392029190949,
> >> >> infoPort=50075, ipcPort=50020):DataXceiver
> >> >> java.io.InterruptedIOException: Interruped while waiting for IO on
> >> >> channel java.nio.channels.SocketChannel[closed]. 466924 millis
> timeout
> >> >> left.
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:349)
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:245)
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:350)
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:436)
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:197)
> >> >>         at
> >> >>
> >>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:99)
> >> >>         at java.lang.Thread.run(Thread.java:722)
> >> >> 2014-04-21 20:34:00,291 INFO
> >> >> org.apache.hadoop.hdfs.server.datanode.DataNode: Waiting for
> >> >> threadgroup to exit, active threads is 0
> >> >> 2014-04-21 20:34:00,404 INFO
> >> >> org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService:
> >> >> Shutting down all async disk service threads...
> >> >> 2014-04-21 20:34:00,405 INFO
> >> >> org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: All
> >> >> async disk service threads have been shut down.
> >> >> 2014-04-21 20:34:00,413 INFO
> >> >> org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
> >> >> 2014-04-21 20:34:00,424 INFO
> >> >> org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
> >> >> /************************************************************
> >> >> SHUTDOWN_MSG: Shutting down DataNode at app-hbase-1/192.168.10.45
> >> >> ************************************************************/
> >> >>
> >> >> On Tue, Apr 22, 2014 at 11:25 AM, Ted Yu <yuzhihong@gmail.com>
> wrote:
> >> >> > bq. one datanode failed
> >> >> >
> >> >> > Was the crash due to out of memory error ?
> >> >> > Can you post the tail of data node log on pastebin ?
> >> >> >
> >> >> > Giving us versions of hadoop and hbase would be helpful.
> >> >> >
> >> >> >
> >> >> > On Mon, Apr 21, 2014 at 7:39 PM, Li Li <fancyerii@gmail.com>
> wrote:
> >> >> >
> >> >> >> I have a small hbase cluster with 1 namenode, 1 secondary
> namenode, 4
> >> >> >> datanode.
> >> >> >> and the hbase master is on the same machine with namenode,
4 hbase
> >> >> >> slave on datanode machine.
> >> >> >> I found average requests per seconds is about 10,000. and
the
> >> clusters
> >> >> >> crashed. and I found the reason is one datanode failed.
> >> >> >>
> >> >> >> the datanode configuration is about 4 cpu core and 10GB memory
> >> >> >> is my cluster overloaded?
> >> >> >>
> >> >>
> >>
>


Mime
View raw message