hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Li Li <fancye...@gmail.com>
Subject Re: is my hbase cluster overloaded?
Date Tue, 22 Apr 2014 08:11:07 GMT
jmap -heap of app-hbase-1 why oldSize so small?

Heap Configuration:
   MinHeapFreeRatio = 40
   MaxHeapFreeRatio = 70
   MaxHeapSize      = 8388608000 (8000.0MB)
   NewSize          = 536870912 (512.0MB)
   MaxNewSize       = 536870912 (512.0MB)
   OldSize          = 5439488 (5.1875MB)
   NewRatio         = 2
   SurvivorRatio    = 8
   PermSize         = 21757952 (20.75MB)
   MaxPermSize      = 85983232 (82.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
   capacity = 483196928 (460.8125MB)
   used     = 263917680 (251.69151306152344MB)
   free     = 219279248 (209.12098693847656MB)
   54.6190724126458% used
Eden Space:
   capacity = 429522944 (409.625MB)
   used     = 242509200 (231.27479553222656MB)
   free     = 187013744 (178.35020446777344MB)
   56.460127075307064% used
>From Space:
   capacity = 53673984 (51.1875MB)
   used     = 21408480 (20.416717529296875MB)
   free     = 32265504 (30.770782470703125MB)
   39.88613925137363% used
To Space:
   capacity = 53673984 (51.1875MB)
   used     = 0 (0.0MB)
   free     = 53673984 (51.1875MB)
   0.0% used
concurrent mark-sweep generation:
   capacity = 2857693184 (2725.30859375MB)
   used     = 1738653912 (1658.1095809936523MB)
   free     = 1119039272 (1067.1990127563477MB)
   60.841168034923655% used
Perm Generation:
   capacity = 40685568 (38.80078125MB)
   used     = 24327624 (23.20063018798828MB)
   free     = 16357944 (15.600151062011719MB)
   59.79423465342797% used


On Tue, Apr 22, 2014 at 4:00 PM, Li Li <fancyerii@gmail.com> wrote:
> Region Servers
> ServerName Start time Load
> app-hbase-1,60020,1398141516916 Tue Apr 22 12:38:36 CST 2014
> requestsPerSecond=448799, numberOfOnlineRegions=8, usedHeapMB=1241,
> maxHeapMB=7948
> app-hbase-2,60020,1398141516914 Tue Apr 22 12:38:36 CST 2014
> requestsPerSecond=0, numberOfOnlineRegions=3, usedHeapMB=829,
> maxHeapMB=7948
> app-hbase-4,60020,1398141525533 Tue Apr 22 12:38:45 CST 2014
> requestsPerSecond=217349, numberOfOnlineRegions=6, usedHeapMB=1356,
> maxHeapMB=7948
> app-hbase-5,60020,1398141524870 Tue Apr 22 12:38:44 CST 2014
> requestsPerSecond=844, numberOfOnlineRegions=3, usedHeapMB=285,
> maxHeapMB=7948
> Total: servers: 4 requestsPerSecond=666992, numberOfOnlineRegions=20
>
> hbase-2 and 4 only have 3 region. how to balance them?
>
> On Tue, Apr 22, 2014 at 3:53 PM, Li Li <fancyerii@gmail.com> wrote:
>> hbase current statistics:
>>
>> Region Servers
>> ServerName Start time Load
>> app-hbase-1,60020,1398141516916 Tue Apr 22 12:38:36 CST 2014
>> requestsPerSecond=6100, numberOfOnlineRegions=7, usedHeapMB=1201,
>> maxHeapMB=7948
>> app-hbase-2,60020,1398141516914 Tue Apr 22 12:38:36 CST 2014
>> requestsPerSecond=1770, numberOfOnlineRegions=4, usedHeapMB=224,
>> maxHeapMB=7948
>> app-hbase-4,60020,1398141525533 Tue Apr 22 12:38:45 CST 2014
>> requestsPerSecond=3445, numberOfOnlineRegions=5, usedHeapMB=798,
>> maxHeapMB=7948
>> app-hbase-5,60020,1398141524870 Tue Apr 22 12:38:44 CST 2014
>> requestsPerSecond=57, numberOfOnlineRegions=2, usedHeapMB=328,
>> maxHeapMB=7948
>> Total: servers: 4 requestsPerSecond=11372, numberOfOnlineRegions=18
>>
>> On Tue, Apr 22, 2014 at 3:40 PM, Li Li <fancyerii@gmail.com> wrote:
>>> I am now restart the sever and running. maybe an hour later the load
>>> will become high
>>>
>>> On Tue, Apr 22, 2014 at 3:02 PM, Azuryy Yu <azuryyyu@gmail.com> wrote:
>>>> Do you still have the same issue?
>>>>
>>>> and:
>>>> -Xmx8000m -server -XX:NewSize=512m -XX:MaxNewSize=512m
>>>>
>>>> the Eden size is too small.
>>>>
>>>>
>>>>
>>>> On Tue, Apr 22, 2014 at 2:55 PM, Li Li <fancyerii@gmail.com> wrote:
>>>>
>>>>> <property>
>>>>>   <name>dfs.datanode.handler.count</name>
>>>>>   <value>100</value>
>>>>>   <description>The number of server threads for the datanode.</description>
>>>>> </property>
>>>>>
>>>>>
>>>>> 1. namenode/master  192.168.10.48
>>>>> http://pastebin.com/7M0zzAAc
>>>>>
>>>>> $free -m (this is value when I restart the hadoop and hbase now, not
>>>>> the value when it crashed)
>>>>>              total       used       free     shared    buffers     cached
>>>>> Mem:         15951       3819      12131          0        509      
1990
>>>>> -/+ buffers/cache:       1319      14631
>>>>> Swap:         8191          0       8191
>>>>>
>>>>> 2. datanode/region 192.168.10.45
>>>>> http://pastebin.com/FiAw1yju
>>>>>
>>>>> $free -m
>>>>>              total       used       free     shared    buffers     cached
>>>>> Mem:         15951       3627      12324          0       1516      
 641
>>>>> -/+ buffers/cache:       1469      14482
>>>>> Swap:         8191          8       8183
>>>>>
>>>>> On Tue, Apr 22, 2014 at 2:29 PM, Azuryy Yu <azuryyyu@gmail.com>
wrote:
>>>>> > one big possible issue is that you have a high concurrent request
on HDFS
>>>>> > or HBASE, then all Data nodes handlers are all busy, then more requests
>>>>> are
>>>>> > pending, then timeout, so you can try to increase
>>>>> > dfs.datanode.handler.count and dfs.namenode.handler.count in the
>>>>> > hdfs-site.xml, then restart the HDFS.
>>>>> >
>>>>> > another, do you have datanode, namenode, region servers JVM options?
if
>>>>> > they are all by default, then there is also have this issue.
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Tue, Apr 22, 2014 at 2:20 PM, Li Li <fancyerii@gmail.com>
wrote:
>>>>> >
>>>>> >> my cluster setup: both 6 machines are virtual machine. each
machine:
>>>>> >> 4CPU Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz 16GB memory
>>>>> >> 192.168.10.48 namenode/jobtracker
>>>>> >> 192.168.10.47 secondary namenode
>>>>> >> 192.168.10.45 datanode/tasktracker
>>>>> >> 192.168.10.46 datanode/tasktracker
>>>>> >> 192.168.10.49 datanode/tasktracker
>>>>> >> 192.168.10.50 datanode/tasktracker
>>>>> >>
>>>>> >> hdfs logs around 20:33
>>>>> >> 192.168.10.48 namenode log  http://pastebin.com/rwgmPEXR
>>>>> >> 192.168.10.45 datanode log http://pastebin.com/HBgZ8rtV (I found
this
>>>>> >> datanode crash first)
>>>>> >> 192.168.10.46 datanode log http://pastebin.com/aQ2emnUi
>>>>> >> 192.168.10.49 datanode log http://pastebin.com/aqsWrrL1
>>>>> >> 192.168.10.50 datanode log http://pastebin.com/V7C6tjpB
>>>>> >>
>>>>> >> hbase logs around 20:33
>>>>> >> 192.168.10.48 master log http://pastebin.com/2ZfeYA1p
>>>>> >> 192.168.10.45 region log http://pastebin.com/idCF2a7Y
>>>>> >> 192.168.10.46 region log http://pastebin.com/WEh4dA0f
>>>>> >> 192.168.10.49 region log http://pastebin.com/cGtpbTLz
>>>>> >> 192.168.10.50 region log http://pastebin.com/bD6h5T6p(very strange,
>>>>> >> not log at 20:33, but have log at 20:32 and 20:34)
>>>>> >>
>>>>> >> On Tue, Apr 22, 2014 at 12:25 PM, Ted Yu <yuzhihong@gmail.com>
wrote:
>>>>> >> > Can you post more of the data node log, around 20:33 ?
>>>>> >> >
>>>>> >> > Cheers
>>>>> >> >
>>>>> >> >
>>>>> >> > On Mon, Apr 21, 2014 at 8:57 PM, Li Li <fancyerii@gmail.com>
wrote:
>>>>> >> >
>>>>> >> >> hadoop 1.0
>>>>> >> >> hbase 0.94.11
>>>>> >> >>
>>>>> >> >> datanode log from 192.168.10.45. why it shut down itself?
>>>>> >> >>
>>>>> >> >> 2014-04-21 20:33:59,309 INFO
>>>>> >> >> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>>>> >> >> blk_-7969006819959471805_202154 received exception
>>>>> >> >> java.io.InterruptedIOException: Interruped while waiting
for IO on
>>>>> >> >> channel java.nio.channels.SocketChannel[closed]. 0
millis timeout
>>>>> >> >> left.
>>>>> >> >> 2014-04-21 20:33:59,310 ERROR
>>>>> >> >> org.apache.hadoop.hdfs.server.datanode.DataNode:
>>>>> >> >> DatanodeRegistration(192.168.10.45:50010,
>>>>> >> >> storageID=DS-1676697306-192.168.10.45-50010-1392029190949,
>>>>> >> >> infoPort=50075, ipcPort=50020):DataXceiver
>>>>> >> >> java.io.InterruptedIOException: Interruped while waiting
for IO on
>>>>> >> >> channel java.nio.channels.SocketChannel[closed]. 0
millis timeout
>>>>> >> >> left.
>>>>> >> >>         at
>>>>> >> >>
>>>>> >>
>>>>> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:349)
>>>>> >> >>         at
>>>>> >> >>
>>>>> >>
>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
>>>>> >> >>         at
>>>>> >> >>
>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>> >> >>         at
>>>>> >> >>
>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>> >> >>         at
>>>>> >> java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
>>>>> >> >>         at
>>>>> >> java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>>>>> >> >>         at java.io.DataInputStream.read(DataInputStream.java:149)
>>>>> >> >>         at
>>>>> >> >>
>>>>> >>
>>>>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:265)
>>>>> >> >>         at
>>>>> >> >>
>>>>> >>
>>>>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:312)
>>>>> >> >>         at
>>>>> >> >>
>>>>> >>
>>>>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376)
>>>>> >> >>         at
>>>>> >> >>
>>>>> >>
>>>>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:532)
>>>>> >> >>         at
>>>>> >> >>
>>>>> >>
>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:398)
>>>>> >> >>         at
>>>>> >> >>
>>>>> >>
>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:107)
>>>>> >> >>         at java.lang.Thread.run(Thread.java:722)
>>>>> >> >> 2014-04-21 20:33:59,310 ERROR
>>>>> >> >> org.apache.hadoop.hdfs.server.datanode.DataNode:
>>>>> >> >> DatanodeRegistration(192.168.10.45:50010,
>>>>> >> >> storageID=DS-1676697306-192.168.10.45-50010-1392029190949,
>>>>> >> >> infoPort=50075, ipcPort=50020):DataXceiver
>>>>> >> >> java.io.InterruptedIOException: Interruped while waiting
for IO on
>>>>> >> >> channel java.nio.channels.SocketChannel[closed]. 466924
millis
>>>>> timeout
>>>>> >> >> left.
>>>>> >> >>         at
>>>>> >> >>
>>>>> >>
>>>>> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:349)
>>>>> >> >>         at
>>>>> >> >>
>>>>> >>
>>>>> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:245)
>>>>> >> >>         at
>>>>> >> >>
>>>>> >>
>>>>> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
>>>>> >> >>         at
>>>>> >> >>
>>>>> >>
>>>>> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
>>>>> >> >>         at
>>>>> >> >>
>>>>> >>
>>>>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:350)
>>>>> >> >>         at
>>>>> >> >>
>>>>> >>
>>>>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:436)
>>>>> >> >>         at
>>>>> >> >>
>>>>> >>
>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:197)
>>>>> >> >>         at
>>>>> >> >>
>>>>> >>
>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:99)
>>>>> >> >>         at java.lang.Thread.run(Thread.java:722)
>>>>> >> >> 2014-04-21 20:34:00,291 INFO
>>>>> >> >> org.apache.hadoop.hdfs.server.datanode.DataNode: Waiting
for
>>>>> >> >> threadgroup to exit, active threads is 0
>>>>> >> >> 2014-04-21 20:34:00,404 INFO
>>>>> >> >> org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService:
>>>>> >> >> Shutting down all async disk service threads...
>>>>> >> >> 2014-04-21 20:34:00,405 INFO
>>>>> >> >> org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService:
All
>>>>> >> >> async disk service threads have been shut down.
>>>>> >> >> 2014-04-21 20:34:00,413 INFO
>>>>> >> >> org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting
Datanode
>>>>> >> >> 2014-04-21 20:34:00,424 INFO
>>>>> >> >> org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
>>>>> >> >> /************************************************************
>>>>> >> >> SHUTDOWN_MSG: Shutting down DataNode at app-hbase-1/192.168.10.45
>>>>> >> >> ************************************************************/
>>>>> >> >>
>>>>> >> >> On Tue, Apr 22, 2014 at 11:25 AM, Ted Yu <yuzhihong@gmail.com>
>>>>> wrote:
>>>>> >> >> > bq. one datanode failed
>>>>> >> >> >
>>>>> >> >> > Was the crash due to out of memory error ?
>>>>> >> >> > Can you post the tail of data node log on pastebin
?
>>>>> >> >> >
>>>>> >> >> > Giving us versions of hadoop and hbase would be
helpful.
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> > On Mon, Apr 21, 2014 at 7:39 PM, Li Li <fancyerii@gmail.com>
>>>>> wrote:
>>>>> >> >> >
>>>>> >> >> >> I have a small hbase cluster with 1 namenode,
1 secondary
>>>>> namenode, 4
>>>>> >> >> >> datanode.
>>>>> >> >> >> and the hbase master is on the same machine
with namenode, 4 hbase
>>>>> >> >> >> slave on datanode machine.
>>>>> >> >> >> I found average requests per seconds is about
10,000. and the
>>>>> >> clusters
>>>>> >> >> >> crashed. and I found the reason is one datanode
failed.
>>>>> >> >> >>
>>>>> >> >> >> the datanode configuration is about 4 cpu
core and 10GB memory
>>>>> >> >> >> is my cluster overloaded?
>>>>> >> >> >>
>>>>> >> >>
>>>>> >>
>>>>>

Mime
View raw message