hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Azuryy Yu <azury...@gmail.com>
Subject Re: is my hbase cluster overloaded?
Date Thu, 24 Apr 2014 02:11:44 GMT
Lars,

I cannot agree with you. I don't think memstore use the old gen, instead,
for a heavy write cluster, memstore is keeping for a short time then
flushed to the disk, so if you specified  small Eden, there will be lots of
promotions, and easy to cause FullGC.

so for the case in this thread, it's a heavy write cluster, so Eden should
be large and MTT is also little large to avoid promotion.



On Wed, Apr 23, 2014 at 12:42 PM, lars hofhansl <larsh@apache.org> wrote:

> What makes you say this?
>
> HBase has a lot of very short lived garbage (like KeyValue objects that do
> not outlive an RPC request) and a lot of long lived data in the memstore
> and the block cache. We want to avoid accumulating the short lived garbage
> and at the same time leave most heap for memstores and blockcache.
>
> A small eden size of 512mb or even less makes sense to me.
>
> -- Lars
>
>
>
> ----- Original Message -----
> From: Azuryy Yu <azuryyyu@gmail.com>
> To: user@hbase.apache.org
> Cc:
> Sent: Tuesday, April 22, 2014 12:02 AM
> Subject: Re: is my hbase cluster overloaded?
>
> Do you still have the same issue?
>
> and:
> -Xmx8000m -server -XX:NewSize=512m -XX:MaxNewSize=512m
>
> the Eden size is too small.
>
>
>
>
> On Tue, Apr 22, 2014 at 2:55 PM, Li Li <fancyerii@gmail.com> wrote:
>
> > <property>
> >   <name>dfs.datanode.handler.count</name>
> >   <value>100</value>
> >   <description>The number of server threads for the
> datanode.</description>
> > </property>
> >
> >
> > 1. namenode/master  192.168.10.48
> > http://pastebin.com/7M0zzAAc
> >
> > $free -m (this is value when I restart the hadoop and hbase now, not
> > the value when it crashed)
> >              total       used       free     shared    buffers     cached
> > Mem:         15951       3819      12131          0        509       1990
> > -/+ buffers/cache:       1319      14631
> > Swap:         8191          0       8191
> >
> > 2. datanode/region 192.168.10.45
> > http://pastebin.com/FiAw1yju
> >
> > $free -m
> >              total       used       free     shared    buffers     cached
> > Mem:         15951       3627      12324          0       1516        641
> > -/+ buffers/cache:       1469      14482
> > Swap:         8191          8       8183
> >
> > On Tue, Apr 22, 2014 at 2:29 PM, Azuryy Yu <azuryyyu@gmail.com> wrote:
> > > one big possible issue is that you have a high concurrent request on
> HDFS
> > > or HBASE, then all Data nodes handlers are all busy, then more requests
> > are
> > > pending, then timeout, so you can try to increase
> > > dfs.datanode.handler.count and dfs.namenode.handler.count in the
> > > hdfs-site.xml, then restart the HDFS.
> > >
> > > another, do you have datanode, namenode, region servers JVM options? if
> > > they are all by default, then there is also have this issue.
> > >
> > >
> > >
> > >
> > > On Tue, Apr 22, 2014 at 2:20 PM, Li Li <fancyerii@gmail.com> wrote:
> > >
> > >> my cluster setup: both 6 machines are virtual machine. each machine:
> > >> 4CPU Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz 16GB memory
> > >> 192.168.10.48 namenode/jobtracker
> > >> 192.168.10.47 secondary namenode
> > >> 192.168.10.45 datanode/tasktracker
> > >> 192.168.10.46 datanode/tasktracker
> > >> 192.168.10.49 datanode/tasktracker
> > >> 192.168.10.50 datanode/tasktracker
> > >>
> > >> hdfs logs around 20:33
> > >> 192.168.10.48 namenode log  http://pastebin.com/rwgmPEXR
> > >> 192.168.10.45 datanode log http://pastebin.com/HBgZ8rtV (I found this
> > >> datanode crash first)
> > >> 192.168.10.46 datanode log http://pastebin.com/aQ2emnUi
> > >> 192.168.10.49 datanode log http://pastebin.com/aqsWrrL1
> > >> 192.168.10.50 datanode log http://pastebin.com/V7C6tjpB
> > >>
> > >> hbase logs around 20:33
> > >> 192.168.10.48 master log http://pastebin.com/2ZfeYA1p
> > >> 192.168.10.45 region log http://pastebin.com/idCF2a7Y
> > >> 192.168.10.46 region log http://pastebin.com/WEh4dA0f
> > >> 192.168.10.49 region log http://pastebin.com/cGtpbTLz
> > >> 192.168.10.50 region log http://pastebin.com/bD6h5T6p(very strange,
> > >> not log at 20:33, but have log at 20:32 and 20:34)
> > >>
> > >> On Tue, Apr 22, 2014 at 12:25 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> > >> > Can you post more of the data node log, around 20:33 ?
> > >> >
> > >> > Cheers
> > >> >
> > >> >
> > >> > On Mon, Apr 21, 2014 at 8:57 PM, Li Li <fancyerii@gmail.com>
wrote:
> > >> >
> > >> >> hadoop 1.0
> > >> >> hbase 0.94.11
> > >> >>
> > >> >> datanode log from 192.168.10.45. why it shut down itself?
> > >> >>
> > >> >> 2014-04-21 20:33:59,309 INFO
> > >> >> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> > >> >> blk_-7969006819959471805_202154 received exception
> > >> >> java.io.InterruptedIOException: Interruped while waiting for IO
on
> > >> >> channel java.nio.channels.SocketChannel[closed]. 0 millis timeout
> > >> >> left.
> > >> >> 2014-04-21 20:33:59,310 ERROR
> > >> >> org.apache.hadoop.hdfs.server.datanode.DataNode:
> > >> >> DatanodeRegistration(192.168.10.45:50010,
> > >> >> storageID=DS-1676697306-192.168.10.45-50010-1392029190949,
> > >> >> infoPort=50075, ipcPort=50020):DataXceiver
> > >> >> java.io.InterruptedIOException: Interruped while waiting for IO
on
> > >> >> channel java.nio.channels.SocketChannel[closed]. 0 millis timeout
> > >> >> left.
> > >> >>         at
> > >> >>
> > >>
> >
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:349)
> > >> >>         at
> > >> >>
> > >>
> >
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
> > >> >>         at
> > >> >>
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> > >> >>         at
> > >> >>
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> > >> >>         at
> > >> java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
> > >> >>         at
> > >> java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> > >> >>         at java.io.DataInputStream.read(DataInputStream.java:149)
> > >> >>         at
> > >> >>
> > >>
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:265)
> > >> >>         at
> > >> >>
> > >>
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:312)
> > >> >>         at
> > >> >>
> > >>
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376)
> > >> >>         at
> > >> >>
> > >>
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:532)
> > >> >>         at
> > >> >>
> > >>
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:398)
> > >> >>         at
> > >> >>
> > >>
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:107)
> > >> >>         at java.lang.Thread.run(Thread.java:722)
> > >> >> 2014-04-21 20:33:59,310 ERROR
> > >> >> org.apache.hadoop.hdfs.server.datanode.DataNode:
> > >> >> DatanodeRegistration(192.168.10.45:50010,
> > >> >> storageID=DS-1676697306-192.168.10.45-50010-1392029190949,
> > >> >> infoPort=50075, ipcPort=50020):DataXceiver
> > >> >> java.io.InterruptedIOException: Interruped while waiting for IO
on
> > >> >> channel java.nio.channels.SocketChannel[closed]. 466924 millis
> > timeout
> > >> >> left.
> > >> >>         at
> > >> >>
> > >>
> >
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:349)
> > >> >>         at
> > >> >>
> > >>
> >
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:245)
> > >> >>         at
> > >> >>
> > >>
> >
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
> > >> >>         at
> > >> >>
> > >>
> >
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
> > >> >>         at
> > >> >>
> > >>
> >
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:350)
> > >> >>         at
> > >> >>
> > >>
> >
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:436)
> > >> >>         at
> > >> >>
> > >>
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:197)
> > >> >>         at
> > >> >>
> > >>
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:99)
> > >> >>         at java.lang.Thread.run(Thread.java:722)
> > >> >> 2014-04-21 20:34:00,291 INFO
> > >> >> org.apache.hadoop.hdfs.server.datanode.DataNode: Waiting for
> > >> >> threadgroup to exit, active threads is 0
> > >> >> 2014-04-21 20:34:00,404 INFO
> > >> >> org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService:
> > >> >> Shutting down all async disk service threads...
> > >> >> 2014-04-21 20:34:00,405 INFO
> > >> >> org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService:
> All
> > >> >> async disk service threads have been shut down.
> > >> >> 2014-04-21 20:34:00,413 INFO
> > >> >> org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
> > >> >> 2014-04-21 20:34:00,424 INFO
> > >> >> org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
> > >> >> /************************************************************
> > >> >> SHUTDOWN_MSG: Shutting down DataNode at app-hbase-1/192.168.10.45
> > >> >> ************************************************************/
> > >> >>
> > >> >> On Tue, Apr 22, 2014 at 11:25 AM, Ted Yu <yuzhihong@gmail.com>
> > wrote:
> > >> >> > bq. one datanode failed
> > >> >> >
> > >> >> > Was the crash due to out of memory error ?
> > >> >> > Can you post the tail of data node log on pastebin ?
> > >> >> >
> > >> >> > Giving us versions of hadoop and hbase would be helpful.
> > >> >> >
> > >> >> >
> > >> >> > On Mon, Apr 21, 2014 at 7:39 PM, Li Li <fancyerii@gmail.com>
> > wrote:
> > >> >> >
> > >> >> >> I have a small hbase cluster with 1 namenode, 1 secondary
> > namenode, 4
> > >> >> >> datanode.
> > >> >> >> and the hbase master is on the same machine with namenode,
4
> hbase
> > >> >> >> slave on datanode machine.
> > >> >> >> I found average requests per seconds is about 10,000.
and the
> > >> clusters
> > >> >> >> crashed. and I found the reason is one datanode failed.
> > >> >> >>
> > >> >> >> the datanode configuration is about 4 cpu core and 10GB
memory
> > >> >> >> is my cluster overloaded?
> > >> >> >>
> > >> >>
> > >>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message