hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Buttler, David" <buttl...@llnl.gov>
Subject RE: how to make tuning for hbase (every couple of days hbase region sever/s crashe)
Date Tue, 23 Aug 2011 23:35:22 GMT
So, if you use 0.5 GB / mapper and 1 GB / reducer, your total memory consumption (minus hbase)
on a slave node should be:
4 GB M/R tasks
1 GB OS -- just a guess
1 GB datanode
1 GB tasktracker 
Leaving you with up to 9 GB for your region servers.  I would suggest bumping your region
server ram up to 8GB, and leave a GB for OS caching. [I am sure someone out there will tell
me I am crazy]

However, it is the log that is the most useful part of your email.  Unfortunately I haven't
seen that error before.
Are you using the Multi methods a lot in your code?


-----Original Message-----
From: Oleg Ruchovets [mailto:oruchovets@gmail.com] 
Sent: Tuesday, August 23, 2011 1:38 PM
To: user@hbase.apache.org
Subject: Re: how to make tuning for hbase (every couple of days hbase region sever/s crashe)

Thank you for detailed response,

On Tue, Aug 23, 2011 at 7:49 PM, Buttler, David <buttler1@llnl.gov> wrote:

> Have you looked at the logs of the region servers?  That is a good first
> place to look.

How many regions are in your system?

         Region Servers

Address Start Code Load
hadoop01 1314007529600 requests=0, regions=212, usedHeap=3171, maxHeap=3983
hadoop02 1314007496109 requests=0, regions=207, usedHeap=2185, maxHeap=3983
hadoop03 1314008874001 requests=0, regions=208, usedHeap=1955, maxHeap=3983
hadoop04 1314008965432 requests=0, regions=209, usedHeap=2034, maxHeap=3983
hadoop05 1314007496533 requests=0, regions=208, usedHeap=1970, maxHeap=3983
hadoop06 1314008874036 requests=0, regions=208, usedHeap=1987, maxHeap=3983
hadoop07 1314007496927 requests=0, regions=209, usedHeap=2118, maxHeap=3983
hadoop08 1314007497034 requests=0, regions=211, usedHeap=2568, maxHeap=3983
hadoop09 1314007497221 requests=0, regions=209, usedHeap=2148, maxHeap=3983
master            1314008873765 requests=0, regions=208, usedHeap=2007,
Total: servers: 10  requests=0, regions=2089

most of the  time GC succeeded to clean up but every 3/4 days used memory
become close to 4G

and there are alot of Exceptions like this:

   org.apache.hadoop.ipc.*HBase*Server: IPC Server
Responder, call multi(org.apache.hadoop.*hbase*.client.MultiAction@491fb2f4)
from output error
2011-08-14 18:37:36,264 WARN org.apache.hadoop.ipc.*HBase*Server: IPC Server
handler 24 on 8041 caught: java.nio.channels.ClosedChannelException
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)

>  If you are using MSLAB, it reserves 2MB/region as a buffer -- that can add
> up when you have lots of regions.

> Given so little information all my guesses are going to be wild, but they
> might help:
> 4GB may not be enough for your current load.

Have you considered changing your memory allocation, giving less to your
> map/reduce jobs and more to HBase?
  Interesting point , can you advice relation between m/r memory allocation
related to hbase region?

  currently we have 512m for map (4 map per machine) and 1024m for reduce(2
reducers per machine)

> What is your key distribution like?

Are you writing to all regions equally, or are you hotspotting on one
> region?

every day before running job we manually allocates regions
with lexicographically start and end key to get good distribution and
prevent hot-spots.

> Check your cell/row sizes.  Are they really large (e.g. cells > 1 MB; rows
> > 100 MB)?  Increasing region size should help here, but there may be an
> issue with your RAM allocation for HBase.
I'll check but I almost sure that we have no row > 100MB, we changed region
size for 500Mb to prevent automatic splits (after successfully inserted job
we have ~ 200-250 mb files per region)
and for the next day we allocate a new one.

> Are you sure that you are not overloading the machine memory? How much RAM
> do you allocate for map reduce jobs?
    512M -- map
    1024 -- reduce

> How do you distribute your processes over machines?  Does your master run
> namenode, hmaster, jobtracker, and zookeeper, while your slaves run
> datanode, tasktracker, and hregionserver?

Exactly , we have such process distribution.
we have 16G ordinary machines
and 48G ram for maser , so I am not sure that I  understand your calculation
, please clarify

 If so, then your memory allocation is:
> 4 GB for regionserver
> 1 GB for OS
> 1 GB for datanode
> 1 GB for tasktracker
> 9/6 GB for M/R
> So, are you sure that all of your m/r tasks take less than 1 GB?
> Dave
> -----Original Message-----
> From: Oleg Ruchovets [mailto:oruchovets@gmail.com]
> Sent: Tuesday, August 23, 2011 2:15 AM
> To: user@hbase.apache.org
> Subject: how to make tuning for hbase (every couple of days hbase region
> sever/s crashe)
> Hi ,
>  Our environment
> hbase 90.2 (10 machine)
>    We have 10 machine grid:
>    master has 48G ram
>    slaves machine has 16G ram.
>    Region Server process has 4G ram
>    Zookeeper process has 2G ram
>     We have 4map/2reducer per machine
> We write from m/r job to hbase (2 jobs a day).  3 months system works
> without any problem , but now  every 3/4 days region server crashes.
>   What we done so far:
>   1) We running major compaction manually once a day
>   2) We increases regions size to prevent automatic split.
> Question:
>   What is the way to make a HBase tuning ?
>   How to debug such problem , because it is still not clear for me what is
> the root  cause of region's crashes?
>   We started from this post.
> http://search-hadoop.com/m/HDoK22ikTCI/M%252FR+vs+hbase+problem+in+production&subj=M+R+vs+hbase+problem+in+production
> <
> http://search-hadoop.com/m/HDoK22ikTCI/M%252FR+vs+hbase+problem+in+production&subj=M+R+vs+hbase+problem+in+production
> >
> Regards
> Oleg.

View raw message