Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 32767105A4 for ; Sun, 26 Jan 2014 11:35:43 +0000 (UTC) Received: (qmail 30643 invoked by uid 500); 26 Jan 2014 11:35:39 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 29800 invoked by uid 500); 26 Jan 2014 11:35:31 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 29792 invoked by uid 99); 26 Jan 2014 11:35:30 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 26 Jan 2014 11:35:30 +0000 X-ASF-Spam-Status: No, hits=-0.5 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rohitdevel14@gmail.com designates 209.85.160.174 as permitted sender) Received: from [209.85.160.174] (HELO mail-yk0-f174.google.com) (209.85.160.174) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 26 Jan 2014 11:35:26 +0000 Received: by mail-yk0-f174.google.com with SMTP id 10so8608744ykt.5 for ; Sun, 26 Jan 2014 03:35:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=nZJjGwlIWOedSnRdkRdICVN07bbfEv5dD/ZhJ8e0vXc=; b=NZvQKv2Kzzux0QhsaL15hLDE5yFLxclmeIfMp+dhZWbeSBuRV4ahX1nnHL6/+AAwhF DWDVAEFiWkhK/2B8wD2gYQVVOI7npNZ2f/jSl0O0WlohNeVylfxofUpFdbH8kgKWJaSl f50IP71+2hBfOyzyalRDSbyHClEX/eQbW1hkhZv5ErkIViZVBikwgEYKSMg1LgbRX4oM fss9biV3ZmNcJlFjolOwOSLFSD58ChPnO5z8NqAqhHM7dq3hG020SW7IC1bW8rT4Rs7q jWTeO5nmXV/bbqJ38L3qLHVa3lQSIIBtZGIm99YX5e1v9x36FaDl9BtUypZFYxwd57E9 TLFg== MIME-Version: 1.0 X-Received: by 10.236.207.73 with SMTP id m49mr21912902yho.5.1390736105870; Sun, 26 Jan 2014 03:35:05 -0800 (PST) Received: by 10.170.130.132 with HTTP; Sun, 26 Jan 2014 03:35:05 -0800 (PST) In-Reply-To: References: <5223F1D8-D775-4C23-A9F3-07D6C79ECF66@gmail.com> Date: Sun, 26 Jan 2014 03:35:05 -0800 Message-ID: Subject: Re: Hbase tuning for heavy write cluster From: Rohit Dev To: user@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Hi Vladimir, Here is my cluster status: Cluster Size: 26 Server memory: 128GB Total Writes per sec (data): 450 Mbps Writes per sec (count) per server: avg ~800 writes/sec (some spikes upto 3000 writes/sec) Max Region Size: 16GB Regions per server: ~140 (not sure if I would be able to merge some empty regions while table is online) We are running CDH 4.3 Recently I changed setttings to: Java heap size for Region Server: 32GB hbase.hregion.memstore.flush.size: 536870912 hbase.hstore.blockingStoreFiles: 30 hbase.hstore.compaction.max: 15 hbase.hregion.memstore.block.multiplier: 3 hbase.regionserver.maxlogs: 90 (it is too high for 512MB memstore flush si= ze ?) I'm seeing weird stuff, like one region has grown upto 34GB! and has 21 store files. MAX_FILESIZE for this table is only 16GB. Could this be a problem ? On Sat, Jan 25, 2014 at 9:49 PM, Vladimir Rodionov wrote: > What is the load (ingestion) rate per server in your cluster? > > Best regards, > Vladimir Rodionov > Principal Platform Engineer > Carrier IQ, www.carrieriq.com > e-mail: vrodionov@carrieriq.com > > ________________________________________ > From: Rohit Dev [rohitdevel14@gmail.com] > Sent: Saturday, January 25, 2014 6:09 PM > To: user@hbase.apache.org > Subject: Re: Hbase tuning for heavy write cluster > > Compaction queue is ~600 in one of the Region-Server, while it is less > than 5 is others (total 26 nodes). > Compaction queue started going up after I increased the settings[1]. > In general, one Major compaction takes about 18 Mins. > > In the same region-server I'm seeing these two log messages frequently: > > 2014-01-25 17:56:27,312 INFO > org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: > logs=3D167, maxlogs=3D32; forcing flush of 1 regions(s): > 3788648752d1c53c1ec80fad72d3e1cc > > 2014-01-25 17:57:48,733 INFO > org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for > 'IPC Server handler 53 on 60020' on region > tsdb,\x008WR\xE2+\x90\x00\x00\x02Qu\xF1\x00\x00(\x00\x97A\x00\x008M(7\x00= \x00Bl\xE85,1390623438462.e6692a1f23b84494015d111954bf00db.: > memstore size 1.5 G is >=3D than blocking 1.5 G size > > Any suggestion what else I can do or is ok to ignore these messages ? > > > [1] > New settings are: > - hbase.hregion.memstore.flush.size - 536870912 > - hbase.hstore.blockingStoreFiles - 30 > - hbase.hstore.compaction.max - 15 > - hbase.hregion.memstore.block.multiplier - 3 > > On Sat, Jan 25, 2014 at 3:00 AM, Ted Yu wrote: >> Yes, it is normal. >> >> On Jan 25, 2014, at 2:12 AM, Rohit Dev wrote: >> >>> I changed these settings: >>> - hbase.hregion.memstore.flush.size - 536870912 >>> - hbase.hstore.blockingStoreFiles - 30 >>> - hbase.hstore.compaction.max - 15 >>> - hbase.hregion.memstore.block.multiplier - 3 >>> >>> Things seems to be getting better now, not seeing any of those >>> annoying ' Blocking updates' messages. Except that, I'm seeing >>> increase in 'Compaction Queue' size on some servers. >>> >>> I noticed memstores are getting flushed, but some with 'compaction >>> requested=3Dtrue'[1]. Is this normal ? >>> >>> >>> [1] >>> INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore >>> flush of ~512.0 M/536921056, currentsize=3D3.0 M/3194800 for region >>> tsdb,\x008ZR\xE1t\xC0\x00\x00\x02\x01\xB0\xF9\x00\x00(\x00\x0B]\x00\x00= 8M((\x00\x00Bk\x9F\x0B,1390598160292.7fb65e5fd5c4cfe08121e85b7354bae9. >>> in 3422ms, sequenceid=3D18522872289, compaction requested=3Dtrue >>> >>> On Fri, Jan 24, 2014 at 6:51 PM, Bryan Beaudreault >>> wrote: >>>> Also, I think you can up the hbase.hstore.blockingStoreFiles quite a b= it >>>> higher. You could try something like 50. It will reduce read perform= ance >>>> a bit, but shouldn't be too bad especially for something like opentsdb= I >>>> think. If you are going to up the blockingStoreFiles you're probably = also >>>> going to want to up hbase.hstore.compaction.max. >>>> >>>> For my tsdb cluster, which is 8 i2.4xlarges in EC2, we have 90 regions= for >>>> tsdb. We were also having issues with blocking, and I upped >>>> blockingStoreFiles to 35, compaction.max to 15, and >>>> memstore.block.multiplier to 3. We haven't had problems since. Memst= ore >>>> flushsize for the tsdb table is 512MB. >>>> >>>> Finally, 64GB heap may prove problematic, but it's worth a shot. I'd >>>> definitely recommend java7 with the G1 garbage collector though. In >>>> general, Java would have a hard time with heap sizes greater than 20-2= 5GB >>>> without some careful tuning. >>>> >>>> >>>> On Fri, Jan 24, 2014 at 9:44 PM, Bryan Beaudreault >>>> wrote: >>>> >>>>> It seems from your ingestion rate you are still blowing through HFile= s too >>>>> fast. You're going to want to up the MEMSTORE_FLUSHSIZE for the tabl= e from >>>>> the default of 128MB. If opentsdb is the only thing on this cluster,= you >>>>> can do the math pretty easily to find the maximum allowable, based on= your >>>>> heap size and accounting for 40% (default) used for the block cache. >>>>> >>>>> >>>>> On Fri, Jan 24, 2014 at 9:38 PM, Rohit Dev w= rote: >>>>> >>>>>> Hi Kevin, >>>>>> >>>>>> We have about 160 regions per server with 16Gig region size and 10 >>>>>> drives for Hbase. I've looked at disk IO and that doesn't seem to be >>>>>> any problem ( % utilization is < 2 across all disk) >>>>>> >>>>>> Any suggestion what heap size I should allocation, normally I alloca= te >>>>>> 16GB. >>>>>> >>>>>> Also, I read increasing hbase.hstore.blockingStoreFiles and >>>>>> hbase.hregion.memstore.block.multiplier is good idea for write-heavy >>>>>> cluster, but in my case it seem to be heading to wrong direction. >>>>>> >>>>>> Thanks >>>>>> >>>>>> On Fri, Jan 24, 2014 at 6:31 PM, Kevin O'dell >>>>>> wrote: >>>>>>> Rohit, >>>>>>> >>>>>>> 64GB heap is not ideal, you will run into some weird issues. How m= any >>>>>>> regions are you running per server, how many drives in each node, a= ny >>>>>> other >>>>>>> settings you changed from default? >>>>>>> On Jan 24, 2014 6:22 PM, "Rohit Dev" wrote= : >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> We are running Opentsdb on CDH 4.3 hbase cluster, with most of the >>>>>>>> default settings. The cluster is heavy on write and I'm trying to = see >>>>>>>> what parameters I can tune to optimize the write performance. >>>>>>>> >>>>>>>> >>>>>>>> # I get messages related to Memstore[1] and Slow Response[2] very >>>>>>>> often, is this an indication of any issue ? >>>>>>>> >>>>>>>> I tried increasing some parameters on one node: >>>>>>>> - hbase.hstore.blockingStoreFiles - from default 7 to 15 >>>>>>>> - hbase.hregion.memstore.block.multiplier - from default 2 to 8 >>>>>>>> - and heap size from 16GB to 64GB >>>>>>>> >>>>>>>> * 'Compaction queue' went up to ~200 within 60 mins after restarti= ng >>>>>>>> region server with new parameters and the log started to get even = more >>>>>>>> noisy. >>>>>>>> >>>>>>>> Can anyone please suggest if I'm going to right direction with the= se >>>>>>>> new settings ? or if there are other thing that I could monitor or >>>>>>>> change to make it better. >>>>>>>> >>>>>>>> Thank you! >>>>>>>> >>>>>>>> >>>>>>>> [1] >>>>>>>> INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking update= s >>>>>>>> for 'IPC Server handler 19 on 60020' on region >>>>>> tsdb,\x008XR\xE0i\x90\x00\x00\x02Q\x7F\x1D\x00\x00(\x00\x0B]\x00\x00= 8M(r\x00\x00Bl\xA7\x8C,1390556781703.0771bf90cab25c503d3400206417f6bf.: >>>>>>>> memstore size 256.3 M is >=3D than blocking 256 M size >>>>>>>> >>>>>>>> [2] >>>>>>>> WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): >>>>>> {"processingtimems":17887,"call":"multi(org.apache.hadoop.hbase.clie= nt.MultiAction@586940ea >>>>>>>> ), >>>>>>>> rpc version=3D1, client version=3D29, >>>>>>>> methodsFingerPrint=3D0","client":"192.168.10.10:54132 >>>>>> ","starttimems":1390587959182,"queuetimems":1498,"class":"HRegionSer= ver","responsesize":0,"method":"multi"} >>>>> >>>>> > > Confidentiality Notice: The information contained in this message, inclu= ding any attachments hereto, may be confidential and is intended to be read= only by the individual or entity to whom this message is addressed. If the= reader of this message is not the intended recipient or an agent or design= ee of the intended recipient, please note that any review, use, disclosure = or distribution of this message or its attachments, in any form, is strictl= y prohibited. If you have received this message in error, please immediate= ly notify the sender and/or Notifications@carrieriq.com and delete or destr= oy any copy of this message and its attachments.