hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jane Tao <jiao....@oracle.com>
Subject Re: BucketCache Configuration
Date Thu, 24 Jul 2014 23:11:58 GMT
Hi Stack,

We are using Hbase 0.96.1.1+cdh5.0.1+68.

In hbase-site.xml:
   <property>
     <name>hbase.bucketcache.ioengine</name>
     <value>offheap</value>
   </property>
   <property>
<name>hbase.bucketcache.percentage.in.combinedcache</name>
     <value>0.8</value>
   </property>
   <property>
     <name>hbase.bucketcache.size</name>
     <value>40000</value>
   </property>

The Java heap size for region server is 32G. The MaxDirectMemorySize is 48G:

-Xms34359738368 -Xmx34359738368 -XX:MaxDirectMemorySize=48g 
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled 
-XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled

Does the above configuration make sense for bucket cache?


Thanks,
Jane

On 7/21/2014 2:05 PM, Stack wrote:
> No. You need 0.96.x HBase at least.
> St.Ack
>
>
> On Mon, Jul 21, 2014 at 9:42 AM, Jane Tao <jiao.tao@oracle.com> wrote:
>
>> Hi Stack,
>>
>> Does what you suggested apply to HBase 0.94.6?
>>
>> Thanks,
>> Jane
>>
>>
>> On 7/18/2014 5:11 PM, Stack wrote:
>>
>>> On Fri, Jul 18, 2014 at 4:46 PM, Jane Tao <jiao.tao@oracle.com> wrote:
>>>
>>>   Hi there,
>>>> Our goal is to fully utilize the free RAM on each node/region server for
>>>> HBase. At the same time, we do not want to incur too much pressure from
>>>> GC
>>>> (garbage collection). Based on Ted's sugguestion, we are trying to using
>>>> bucket cache.
>>>>
>>>> However, we are not sure:
>>>>
>>>>   Sorry.  Config is a little complicated at the moment.  It has had some
>>> cleanup in trunk.  Meantime...
>>>
>>>
>>>
>>>   - The relation between XX:MaxDirectMemorySize and java heap size. Is
>>>> MaxDirectMemorySize part of java heap size ?
>>>>
>>>>
>>> No.  It is the maximum for how much the JVM should use OFFHEAP.  Here is a
>>> bit of a note I just added to the refguide:
>>>
>>>
>>>                    <para>The default maximum direct memory varies by JVM.
>>>    Traditionally it is 64M
>>>                        or some relation to allocated heap size (-Xmx) or no
>>> limit at all (JDK7 apparently).
>>>                        HBase servers use direct memory, in particular
>>> short-circuit reading, the hosted DFSClient will
>>>                        allocate direct memory buffers.  If you do offheap
>>> block caching, you'll
>>>                        be making use of direct memory.  Starting your JVM,
>>> make sure
>>>                        the <varname>-XX:MaxDirectMemorySize</varname>
>>> setting
>>> in
>>>                        <filename>conf/hbase-env.sh</filename> is
set to
>>> some
>>> value that is
>>>                        higher than what you have allocated to your offheap
>>> blockcache
>>>                        (<varname>hbase.bucketcache.size</varname>).
 It
>>> should be larger than your offheap block
>>>                        cache and then some for DFSClient usage (How much
>>> the
>>> DFSClient uses is not
>>>                        easy to quantify; it is the number of open hfiles *
>>> <varname>hbase.dfs.client.read.shortcircuit.buffer.size</varname>
>>>                        where hbase.dfs.client.read.
>>> shortcircuit.buffer.size
>>> is set to 128k in HBase -- see <filename>hbase-default.xml</filename>
>>>                        default configurations).
>>>                    </para>
>>>
>>>
>>>
>>>   - The relation between XX:MaxDirectMemorySize and hbase.bucketcache.size.
>>>> Are they equal?
>>>>
>>>>   XX:MaxDirectMemorySize should be larger than hbase.bucketcache.size.
>>>   They
>>> should not be equal.  See note above for why.
>>>
>>>
>>>
>>>   - How to adjust hbase.bucketcache.percentage.in.combinedcache?
>>>>
>>>>   Or just leave it as is.  To adjust, just set it to other than the
>>> default
>>> which is 0.9 (0.9 of hbase.bucketcache.size).  This configuration has been
>>> removed from trunk because it is confusing.
>>>
>>>
>>>
>>>   Right now, we have the following configuration. Does it make sense?
>>>> - java heap size of each hbase region server to 12 GB
>>>> - -XX:MaxDirectMemorySize to be 6GB
>>>>
>>>>   Why not set it to 48G since you have the RAM?
>>>
>>>
>>>   - hbase-site.xml :
>>>>     <property>
>>>>       <name>hbase.offheapcache.percentage</name>
>>>>       <value>0</value>
>>>>     </property>
>>>>
>>>>   This setting is not needed.  0 is the default.
>>>
>>>      <property>
>>>>       <name>hbase.bucketcache.ioengine</name>
>>>>       <value>offheap</value>
>>>>     </property>
>>>>     <property>
>>>> <name>hbase.bucketcache.percentage.in.combinedcache</name>
>>>>       <value>0.8</value>
>>>>     </property>
>>>>
>>>>   Or you could just undo this setting and go with the default which is
>>> 0.9.
>>>
>>>
>>>      <property>
>>>>       <name>hbase.bucketcache.size</name>
>>>>       <value>6144</value>
>>>>     </property>
>>>>
>>>>
>>>>   Adjust this to be 40000? (smile).
>>> Let us know how it goes.
>>>
>>> What version of HBase you running?  Thanks.
>>>
>>> St.Ack
>>>
>>>
>>>
>>>   Thanks,
>>>> Jane
>>>>
>>>>
>>>> On 7/17/2014 3:05 PM, Ted Yu wrote:
>>>>
>>>>   Have you considered using BucketCache ?
>>>>> Please read 9.6.4.1 under
>>>>> http://hbase.apache.org/book.html#regionserver.arch
>>>>>
>>>>> Note: remember to verify the config values against the hbase release
>>>>> you're
>>>>> using.
>>>>>
>>>>> Cheers
>>>>>
>>>>>
>>>>> On Thu, Jul 17, 2014 at 2:53 PM, Jane Tao <jiao.tao@oracle.com>
wrote:
>>>>>
>>>>>    Hi Ted,
>>>>>
>>>>>> In my case, there is a 6 Node HBase cluster setup (running on Oracle
>>>>>> BDA).
>>>>>> Each node has plenty of RAM (64GB) and CPU cores. Several articles
seem
>>>>>> to
>>>>>> suggest
>>>>>> that it is not a good idea to allocate too much RAM to region server's
>>>>>> heap setting.
>>>>>>
>>>>>> If each region server has 10GB heap and there is only one region
server
>>>>>> per node, then
>>>>>> I have 10x6=60GB for the whole HBase. This setting is good for ~100M
>>>>>> rows
>>>>>> but starts
>>>>>> to incur lots of GC activities on region servers when loading billions
>>>>>> of
>>>>>> rows.
>>>>>>
>>>>>> Basically, I need a configuration that can fully utilize the free
RAM
>>>>>> on
>>>>>> each node for HBase.
>>>>>>
>>>>>> Thanks,
>>>>>> Jane
>>>>>> On 7/16/2014 4:17 PM, Ted Yu wrote:
>>>>>>
>>>>>>    Jane:
>>>>>>
>>>>>>> Can you briefly describe the use case where multiple region servers
>>>>>>> are
>>>>>>> needed on the same host ?
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 16, 2014 at 3:14 PM, Dhaval Shah <
>>>>>>> prince_mithibai@yahoo.co.in
>>>>>>> wrote:
>>>>>>>
>>>>>>>     Its certainly possible (atleast with command line) but probably
>>>>>>> very
>>>>>>>
>>>>>>>   messy. You will need to have different ports, different log
files,
>>>>>>>> different pid files, possibly even different configs on the
same
>>>>>>>> machine.
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Dhaval
>>>>>>>>
>>>>>>>>
>>>>>>>> ________________________________
>>>>>>>>      From: Jane Tao <jiao.tao@oracle.com>
>>>>>>>> To: user@hbase.apache.org
>>>>>>>> Sent: Wednesday, 16 July 2014 6:06 PM
>>>>>>>> Subject: multiple region servers at one machine
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi there,
>>>>>>>>
>>>>>>>> Is it possible to run multiple region servers at one machine/node?
If
>>>>>>>> this is possible, how to start multiple region servers with
command
>>>>>>>> lines or cloudera manager?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Jane
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>>    --
>>>>>>>>
>>>>>>   --
>>>>
>>>>
>> --
>>
>>

-- 


Mime
View raw message