Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (athena.apache.org: domain of rohitdevel14@gmail.com
 designates 209.85.160.174 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <DC5EBE7F3610EB4CA5C7E92D76873E1518629B58C4@exchange2007.carrieriq.com>
References: 
 <CADhfCdNYH5LxcP0-p3puGHBzj+=ksBtbzPSvRB3z-RrqXFH=aQ@mail.gmail.com>
	<CAGngS9cu588iCOcnR-a2veCZ5OvDW=1J+P77VYzDzo5y3aw+4w@mail.gmail.com>
	<CADhfCdMW6NerkLgaUFYXYM9VZy4=jh9y4t3Lf6ytHrV21pc_Zg@mail.gmail.com>
	<CANZDn9u_a=uKMn=yLPoxHMK9cDZR2NBjnFMzxm0tw3_V_=2GFw@mail.gmail.com>
	<CANZDn9v+=5RG8-urnsfu_G+k8AKESoqoj5=DazaAKWQfZPYbNA@mail.gmail.com>
	<CADhfCdNcDtgYs8AYJy33tmrTGLN3oyAYfQK_97P5ApOqjBNKqw@mail.gmail.com>
	<5223F1D8-D775-4C23-A9F3-07D6C79ECF66@gmail.com>
	<CADhfCdNd=tuM-DswtBjkq2xBaCpEJ4Vn9N1BqLu-SO7OrDXxfg@mail.gmail.com>
	<DC5EBE7F3610EB4CA5C7E92D76873E1518629B58C4@exchange2007.carrieriq.com>
Date: Sun, 26 Jan 2014 03:35:05 -0800
Message-ID: 
 <CADhfCdOLaE+xccAdK3EUrRNQWCcbkGMG3-kRqtWvOWjBjqBNGA@mail.gmail.com>
Subject: Re: Hbase tuning for heavy write cluster
From: Rohit Dev <rohitdevel14@gmail.com>
To: user@hbase.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi Vladimir,

Here is my cluster status:

Cluster Size: 26
Server memory: 128GB
Total Writes per sec (data): 450 Mbps
Writes per sec (count) per server: avg ~800 writes/sec (some spikes
upto 3000 writes/sec)
Max Region Size: 16GB
Regions per server: ~140 (not sure if I would be able to merge some
empty regions while table is online)
We are running CDH 4.3

Recently I changed setttings to:
 Java heap size for Region Server: 32GB
 hbase.hregion.memstore.flush.size: 536870912
 hbase.hstore.blockingStoreFiles: 30
 hbase.hstore.compaction.max: 15
 hbase.hregion.memstore.block.multiplier: 3
 hbase.regionserver.maxlogs: 90 (it is too high for 512MB memstore flush si=
ze ?)

I'm seeing weird stuff, like one region has grown upto 34GB! and has
21 store files. MAX_FILESIZE for this table is only 16GB.
Could this be a problem ?


On Sat, Jan 25, 2014 at 9:49 PM, Vladimir Rodionov
<vrodionov@carrieriq.com> wrote:
> What is the load (ingestion) rate per server in your cluster?
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Rohit Dev [rohitdevel14@gmail.com]
> Sent: Saturday, January 25, 2014 6:09 PM
> To: user@hbase.apache.org
> Subject: Re: Hbase tuning for heavy write cluster
>
> Compaction queue is ~600 in one of the Region-Server, while it is less
> than 5 is others (total 26 nodes).
> Compaction queue started going up after I increased the settings[1].
> In general, one Major compaction takes about 18 Mins.
>
> In the same region-server I'm seeing these two log messages frequently:
>
> 2014-01-25 17:56:27,312 INFO
> org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs:
> logs=3D167, maxlogs=3D32; forcing flush of 1 regions(s):
> 3788648752d1c53c1ec80fad72d3e1cc
>
> 2014-01-25 17:57:48,733 INFO
> org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for
> 'IPC Server handler 53 on 60020' on region
> tsdb,\x008WR\xE2+\x90\x00\x00\x02Qu\xF1\x00\x00(\x00\x97A\x00\x008M(7\x00=
\x00Bl\xE85,1390623438462.e6692a1f23b84494015d111954bf00db.:
> memstore size 1.5 G is >=3D than blocking 1.5 G size
>
> Any suggestion what else I can do or is ok to ignore these messages ?
>
>
> [1]
> New settings are:
>  - hbase.hregion.memstore.flush.size - 536870912
>  - hbase.hstore.blockingStoreFiles - 30
>  - hbase.hstore.compaction.max - 15
>  - hbase.hregion.memstore.block.multiplier - 3
>
> On Sat, Jan 25, 2014 at 3:00 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>> Yes, it is normal.
>>
>> On Jan 25, 2014, at 2:12 AM, Rohit Dev <rohitdevel14@gmail.com> wrote:
>>
>>> I changed these settings:
>>> - hbase.hregion.memstore.flush.size - 536870912
>>> - hbase.hstore.blockingStoreFiles - 30
>>> - hbase.hstore.compaction.max - 15
>>> - hbase.hregion.memstore.block.multiplier - 3
>>>
>>> Things seems to be getting better now, not seeing any of those
>>> annoying ' Blocking updates' messages. Except that, I'm seeing
>>> increase in 'Compaction Queue' size on some servers.
>>>
>>> I noticed memstores are getting flushed, but some with 'compaction
>>> requested=3Dtrue'[1]. Is this normal ?
>>>
>>>
>>> [1]
>>> INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore
>>> flush of ~512.0 M/536921056, currentsize=3D3.0 M/3194800 for region
>>> tsdb,\x008ZR\xE1t\xC0\x00\x00\x02\x01\xB0\xF9\x00\x00(\x00\x0B]\x00\x00=
8M((\x00\x00Bk\x9F\x0B,1390598160292.7fb65e5fd5c4cfe08121e85b7354bae9.
>>> in 3422ms, sequenceid=3D18522872289, compaction requested=3Dtrue
>>>
>>> On Fri, Jan 24, 2014 at 6:51 PM, Bryan Beaudreault
>>> <bbeaudreault@hubspot.com> wrote:
>>>> Also, I think you can up the hbase.hstore.blockingStoreFiles quite a b=
it
>>>> higher.  You could try something like 50.  It will reduce read perform=
ance
>>>> a bit, but shouldn't be too bad especially for something like opentsdb=
 I
>>>> think.  If you are going to up the blockingStoreFiles you're probably =
also
>>>> going to want to up hbase.hstore.compaction.max.
>>>>
>>>> For my tsdb cluster, which is 8 i2.4xlarges in EC2, we have 90 regions=
 for
>>>> tsdb.  We were also having issues with blocking, and I upped
>>>> blockingStoreFiles to 35, compaction.max to 15, and
>>>> memstore.block.multiplier to 3.  We haven't had problems since.  Memst=
ore
>>>> flushsize for the tsdb table is 512MB.
>>>>
>>>> Finally, 64GB heap may prove problematic, but it's worth a shot.  I'd
>>>> definitely recommend java7 with the G1 garbage collector though.  In
>>>> general, Java would have a hard time with heap sizes greater than 20-2=
5GB
>>>> without some careful tuning.
>>>>
>>>>
>>>> On Fri, Jan 24, 2014 at 9:44 PM, Bryan Beaudreault <bbeaudreault@hubsp=
ot.com
>>>>> wrote:
>>>>
>>>>> It seems from your ingestion rate you are still blowing through HFile=
s too
>>>>> fast.  You're going to want to up the MEMSTORE_FLUSHSIZE for the tabl=
e from
>>>>> the default of 128MB.  If opentsdb is the only thing on this cluster,=
 you
>>>>> can do the math pretty easily to find the maximum allowable, based on=
 your
>>>>> heap size and accounting for 40% (default) used for the block cache.
>>>>>
>>>>>
>>>>> On Fri, Jan 24, 2014 at 9:38 PM, Rohit Dev <rohitdevel14@gmail.com> w=
rote:
>>>>>
>>>>>> Hi Kevin,
>>>>>>
>>>>>> We have about 160 regions per server with 16Gig region size and 10
>>>>>> drives for Hbase. I've looked at disk IO and that doesn't seem to be
>>>>>> any problem ( % utilization is < 2 across all disk)
>>>>>>
>>>>>> Any suggestion what heap size I should allocation, normally I alloca=
te
>>>>>> 16GB.
>>>>>>
>>>>>> Also, I read increasing  hbase.hstore.blockingStoreFiles and
>>>>>> hbase.hregion.memstore.block.multiplier is good idea for write-heavy
>>>>>> cluster, but in my case it seem to be heading to wrong direction.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> On Fri, Jan 24, 2014 at 6:31 PM, Kevin O'dell <kevin.odell@cloudera.=
com>
>>>>>> wrote:
>>>>>>> Rohit,
>>>>>>>
>>>>>>>  64GB heap is not ideal, you will run into some weird issues. How m=
any
>>>>>>> regions are you running per server, how many drives in each node, a=
ny
>>>>>> other
>>>>>>> settings you changed from default?
>>>>>>> On Jan 24, 2014 6:22 PM, "Rohit Dev" <rohitdevel14@gmail.com> wrote=
:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> We are running Opentsdb on CDH 4.3 hbase cluster, with most of the
>>>>>>>> default settings. The cluster is heavy on write and I'm trying to =
see
>>>>>>>> what parameters I can tune to optimize the write performance.
>>>>>>>>
>>>>>>>>
>>>>>>>> # I get messages related to Memstore[1] and Slow Response[2] very
>>>>>>>> often, is this an indication of any issue ?
>>>>>>>>
>>>>>>>> I tried increasing some parameters on one node:
>>>>>>>> - hbase.hstore.blockingStoreFiles - from default 7 to 15
>>>>>>>> - hbase.hregion.memstore.block.multiplier - from default 2 to 8
>>>>>>>> - and heap size from 16GB to 64GB
>>>>>>>>
>>>>>>>> * 'Compaction queue' went up to ~200 within 60 mins after restarti=
ng
>>>>>>>> region server with new parameters and the log started to get even =
more
>>>>>>>> noisy.
>>>>>>>>
>>>>>>>> Can anyone please suggest if I'm going to right direction with the=
se
>>>>>>>> new settings ? or if there are other thing that I could monitor or
>>>>>>>> change to make it better.
>>>>>>>>
>>>>>>>> Thank you!
>>>>>>>>
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking update=
s
>>>>>>>> for 'IPC Server handler 19 on 60020' on region
>>>>>> tsdb,\x008XR\xE0i\x90\x00\x00\x02Q\x7F\x1D\x00\x00(\x00\x0B]\x00\x00=
8M(r\x00\x00Bl\xA7\x8C,1390556781703.0771bf90cab25c503d3400206417f6bf.:
>>>>>>>> memstore size 256.3 M is >=3D than blocking 256 M size
>>>>>>>>
>>>>>>>> [2]
>>>>>>>> WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow):
>>>>>> {"processingtimems":17887,"call":"multi(org.apache.hadoop.hbase.clie=
nt.MultiAction@586940ea
>>>>>>>> ),
>>>>>>>> rpc version=3D1, client version=3D29,
>>>>>>>> methodsFingerPrint=3D0","client":"192.168.10.10:54132
>>>>>> ","starttimems":1390587959182,"queuetimems":1498,"class":"HRegionSer=
ver","responsesize":0,"method":"multi"}
>>>>>
>>>>>
>
> Confidentiality Notice:  The information contained in this message, inclu=
ding any attachments hereto, may be confidential and is intended to be read=
 only by the individual or entity to whom this message is addressed. If the=
 reader of this message is not the intended recipient or an agent or design=
ee of the intended recipient, please note that any review, use, disclosure =
or distribution of this message or its attachments, in any form, is strictl=
y prohibited.  If you have received this message in error, please immediate=
ly notify the sender and/or Notifications@carrieriq.com and delete or destr=
oy any copy of this message and its attachments.