Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: domain of jiao.tao@oracle.com designates
 141.146.126.69 as permitted sender)
Message-ID: <53CD42E5.6000905@oracle.com>
Date: Mon, 21 Jul 2014 09:42:13 -0700
From: Jane Tao <jiao.tao@oracle.com>
Organization: Oracle Corporation
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:17.0) Gecko/20130307 Thunderbird/17.0.4
MIME-Version: 1.0
To: stack@duboce.net
CC: user@hbase.apache.org
Subject: Re: BucketCache Configuration
References: <53C6F75C.60108@oracle.com>
 <1405548870.94850.YahooMailNeo@web190104.mail.sg3.yahoo.com>
 <CALte62xgkVUxy0yHv+RH1zb3agaTjejrhA_582myKxEpYLfeKw@mail.gmail.com>
 <53C845D3.9020908@oracle.com>
 <CALte62ygv2eF1AV6LtJ02ghuFBReGWmjWszO5So7m-QRt1O_Mw@mail.gmail.com>
 <53C9B1C1.1070305@oracle.com>
 <CADcMMgE2dUMXCCfn1MfpZNsFpq6xAsQX5+LoMrezJd4eTuHz4Q@mail.gmail.com>
In-Reply-To: 
 <CADcMMgE2dUMXCCfn1MfpZNsFpq6xAsQX5+LoMrezJd4eTuHz4Q@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit

Hi Stack,

Does what you suggested apply to HBase 0.94.6?

Thanks,
Jane

On 7/18/2014 5:11 PM, Stack wrote:
> On Fri, Jul 18, 2014 at 4:46 PM, Jane Tao <jiao.tao@oracle.com> wrote:
>
>> Hi there,
>>
>> Our goal is to fully utilize the free RAM on each node/region server for
>> HBase. At the same time, we do not want to incur too much pressure from GC
>> (garbage collection). Based on Ted's sugguestion, we are trying to using
>> bucket cache.
>>
>> However, we are not sure:
>>
> Sorry.  Config is a little complicated at the moment.  It has had some
> cleanup in trunk.  Meantime...
>
>
>
>> - The relation between XX:MaxDirectMemorySize and java heap size. Is
>> MaxDirectMemorySize part of java heap size ?
>>
>
> No.  It is the maximum for how much the JVM should use OFFHEAP.  Here is a
> bit of a note I just added to the refguide:
>
>
>                   <para>The default maximum direct memory varies by JVM.
>   Traditionally it is 64M
>                       or some relation to allocated heap size (-Xmx) or no
> limit at all (JDK7 apparently).
>                       HBase servers use direct memory, in particular
> short-circuit reading, the hosted DFSClient will
>                       allocate direct memory buffers.  If you do offheap
> block caching, you'll
>                       be making use of direct memory.  Starting your JVM,
> make sure
>                       the <varname>-XX:MaxDirectMemorySize</varname> setting
> in
>                       <filename>conf/hbase-env.sh</filename> is set to some
> value that is
>                       higher than what you have allocated to your offheap
> blockcache
>                       (<varname>hbase.bucketcache.size</varname>).  It
> should be larger than your offheap block
>                       cache and then some for DFSClient usage (How much the
> DFSClient uses is not
>                       easy to quantify; it is the number of open hfiles *
> <varname>hbase.dfs.client.read.shortcircuit.buffer.size</varname>
>                       where hbase.dfs.client.read.shortcircuit.buffer.size
> is set to 128k in HBase -- see <filename>hbase-default.xml</filename>
>                       default configurations).
>                   </para>
>
>
>
>> - The relation between XX:MaxDirectMemorySize and hbase.bucketcache.size.
>> Are they equal?
>>
> XX:MaxDirectMemorySize should be larger than hbase.bucketcache.size.  They
> should not be equal.  See note above for why.
>
>
>
>> - How to adjust hbase.bucketcache.percentage.in.combinedcache?
>>
>>
> Or just leave it as is.  To adjust, just set it to other than the default
> which is 0.9 (0.9 of hbase.bucketcache.size).  This configuration has been
> removed from trunk because it is confusing.
>
>
>
>> Right now, we have the following configuration. Does it make sense?
>>
>> - java heap size of each hbase region server to 12 GB
>> - -XX:MaxDirectMemorySize to be 6GB
>>
> Why not set it to 48G since you have the RAM?
>
>
>
>> - hbase-site.xml :
>>    <property>
>>      <name>hbase.offheapcache.percentage</name>
>>      <value>0</value>
>>    </property>
>>
> This setting is not needed.  0 is the default.
>
>
>>    <property>
>>      <name>hbase.bucketcache.ioengine</name>
>>      <value>offheap</value>
>>    </property>
>>    <property>
>> <name>hbase.bucketcache.percentage.in.combinedcache</name>
>>      <value>0.8</value>
>>    </property>
>>
> Or you could just undo this setting and go with the default which is 0.9.
>
>
>>    <property>
>>      <name>hbase.bucketcache.size</name>
>>      <value>6144</value>
>>    </property>
>>
>>
> Adjust this to be 40000? (smile).
> Let us know how it goes.
>
> What version of HBase you running?  Thanks.
>
> St.Ack
>
>
>
>> Thanks,
>> Jane
>>
>>
>> On 7/17/2014 3:05 PM, Ted Yu wrote:
>>
>>> Have you considered using BucketCache ?
>>>
>>> Please read 9.6.4.1 under
>>> http://hbase.apache.org/book.html#regionserver.arch
>>>
>>> Note: remember to verify the config values against the hbase release
>>> you're
>>> using.
>>>
>>> Cheers
>>>
>>>
>>> On Thu, Jul 17, 2014 at 2:53 PM, Jane Tao <jiao.tao@oracle.com> wrote:
>>>
>>>   Hi Ted,
>>>> In my case, there is a 6 Node HBase cluster setup (running on Oracle
>>>> BDA).
>>>> Each node has plenty of RAM (64GB) and CPU cores. Several articles seem
>>>> to
>>>> suggest
>>>> that it is not a good idea to allocate too much RAM to region server's
>>>> heap setting.
>>>>
>>>> If each region server has 10GB heap and there is only one region server
>>>> per node, then
>>>> I have 10x6=60GB for the whole HBase. This setting is good for ~100M rows
>>>> but starts
>>>> to incur lots of GC activities on region servers when loading billions of
>>>> rows.
>>>>
>>>> Basically, I need a configuration that can fully utilize the free RAM on
>>>> each node for HBase.
>>>>
>>>> Thanks,
>>>> Jane
>>>> On 7/16/2014 4:17 PM, Ted Yu wrote:
>>>>
>>>>   Jane:
>>>>> Can you briefly describe the use case where multiple region servers are
>>>>> needed on the same host ?
>>>>>
>>>>> Cheers
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jul 16, 2014 at 3:14 PM, Dhaval Shah <
>>>>> prince_mithibai@yahoo.co.in
>>>>> wrote:
>>>>>
>>>>>    Its certainly possible (atleast with command line) but probably very
>>>>>
>>>>>> messy. You will need to have different ports, different log files,
>>>>>> different pid files, possibly even different configs on the same
>>>>>> machine.
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Dhaval
>>>>>>
>>>>>>
>>>>>> ________________________________
>>>>>>     From: Jane Tao <jiao.tao@oracle.com>
>>>>>> To: user@hbase.apache.org
>>>>>> Sent: Wednesday, 16 July 2014 6:06 PM
>>>>>> Subject: multiple region servers at one machine
>>>>>>
>>>>>>
>>>>>> Hi there,
>>>>>>
>>>>>> Is it possible to run multiple region servers at one machine/node? If
>>>>>> this is possible, how to start multiple region servers with command
>>>>>> lines or cloudera manager?
>>>>>>
>>>>>> Thanks,
>>>>>> Jane
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>>   --
>>>>
>>>>
>> --
>>
>>

--