Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D3617DD51 for ; Sat, 13 Oct 2012 02:31:17 +0000 (UTC) Received: (qmail 72677 invoked by uid 500); 13 Oct 2012 02:31:15 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 72621 invoked by uid 500); 13 Oct 2012 02:31:15 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 72613 invoked by uid 99); 13 Oct 2012 02:31:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 13 Oct 2012 02:31:15 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of svarma.ng@gmail.com designates 209.85.212.41 as permitted sender) Received: from [209.85.212.41] (HELO mail-vb0-f41.google.com) (209.85.212.41) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 13 Oct 2012 02:31:10 +0000 Received: by mail-vb0-f41.google.com with SMTP id v13so4297664vbk.14 for ; Fri, 12 Oct 2012 19:30:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=3M80/ieVt++/3potLlhnf+6AsGi2zlThc4ibW9wTiYA=; b=vRhnHIpnSHsK+Li+/UV5kVMI+z1+u3mNi3q1O0l5laIbSg3Ocvm7db5f4/o9/C1gMG pK550UhDhILKyoHXGZbrSP1D8bi3gayPwXvjxqpoHWtIR1vDqS4SM5wIOnkZU8PULOjs 9trn2STCTKxn9OX8tr5sfWC9ZNxM4S2Q2fQie665bOlw0+DtN0Ae3LDGgCusuP0I/Vga sFIvO+LBGhWDnS0DcUZzsPP/Zso/IdR6btaut+pJCSGFWLPVQvSIXN4ETNAriIVMsZhq k6k8IvdQXIxwP5REjMIxRwfQ5/Zu4t3YQ55P6rvjfFcu30XVMqvjP7tFz2Wg46LFmg3T gIxg== MIME-Version: 1.0 Received: by 10.220.119.196 with SMTP id a4mr3536856vcr.19.1350095449045; Fri, 12 Oct 2012 19:30:49 -0700 (PDT) Received: by 10.58.28.228 with HTTP; Fri, 12 Oct 2012 19:30:49 -0700 (PDT) In-Reply-To: References: <6204964974022779715@unknownmsgid> <51BE77A7231E9E488E58380AFBD77272105DB964@mail1.impetus.co.in> Date: Fri, 12 Oct 2012 19:30:49 -0700 Message-ID: Subject: Re: more regionservers does not improve performance From: Suraj Varma To: user@hbase.apache.org Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Hi Jonathan: What specific metric on ganglia did you notice for "IO is spiking"? Is it your disk IO? Is your disk swapping? Do you see cpu iowait spikes? I see you have given 8g to the RegionServer ... how much RAM is available total on that node? What heap are the individual mappers & DN set to run on (i.e. check whether you are overallocated on heap when the _mappers_ run ... causing disk swapping ... leading to IO?). There can be multiple causes ... so, you may need to look at ganglia stats and narrow the bottleneck down as described in http://hbase.apache.org/book/casestudies.perftroub.html Here's a good reference for all the memstore related tweaks you can try (and also to understand what each configuration means): http://blog.sematext.com/2012/07/16/hbase-memstore-what-you-should-know/ Also, provide more details on your schema (CFs, row size), Put sizes, etc as well to see if that triggers an idea from the list. --S On Fri, Oct 12, 2012 at 12:46 PM, Bryan Beaudreault wrote: > I recommend turning on debug logging on your region servers. You may nee= d > to tune down certain packages back to info, because there are a few spamm= y > ones, but overall it helps. > > You should see messages such as "12/10/09 14:22:57 INFO > regionserver.HRegion: Blocking updates for 'IPC Server handler 41 on 6002= 0' > on region XXX: memstore size 256.0m is >=3D than blocking 256.0m size". = As > you can see, this is an INFO anyway so you should be able to see it now i= f > it is happening. > > You can try upping the number of IPC handlers and the memstore flush > threshold. Also, maybe you are bottlenecked by the WAL. Try doing > put.setWriteToWAL(false), just to see if it increases performance. If so > and you want to be a bit more safe with regard to the wal, you can try > turning on deferred flush on your table. I don't really know how to > increase performance of the wal aside from that, if this does seem to hav= e > an affect. > > > > On Fri, Oct 12, 2012 at 3:15 PM, Jonathan Bishop w= rote: > >> Kevin, >> >> Sorry, I am fairly new to HBase. Can you be specific about what settings= I >> can change, and also where they are specified? >> >> Pretty sure I am not hotspotting, and increasing memstore does not seem = to >> have any effect. >> >> I do not seen any messages in my regionserver logs concerning blocking. >> >> I am suspecting that I am hitting some limit in our grid, but would like= to >> know where that limit is being imposed. >> >> Jon >> >> On Fri, Oct 12, 2012 at 6:44 AM, Kevin O'dell > >wrote: >> >> > Jonathan, >> > >> > Lets take a deeper look here. >> > >> > What is your memstore set at for the table/CF in question? Lets compa= re >> > that value with the flush size you are seeing for your regions. If th= ey >> > are really small flushes is it all to the same region? If so that is >> going >> > to be schema issues. If they are full flushes you can up your memstor= e >> > assuming you have the heap to cover it. If they are smaller flushes b= ut >> to >> > different regions you most likely are suffering from global limit >> pressure >> > and flushing too soon. >> > >> > Are you flushing prematurely due to HLogs rolling? Take a look for to= o >> > many hlogs and look at the flushes. It may benefit you to raise that >> > value. >> > >> > Are you blocking? As Suraj was saying you may be blocking in 90second >> > blocks. Check the RS logs for those messages as well and then Suraj's >> > advice. >> > >> > This is where I would start to optimize your write path. I hope the >> above >> > helps. >> > >> > On Fri, Oct 12, 2012 at 3:34 AM, Suraj Varma >> wrote: >> > >> > > What have you configured your hbase.hstore.blockingStoreFiles and >> > > hbase.hregion.memstore.block.multiplier? Both of these block updates >> > > when the limit is hit. Try increasing these to say 20 and 4 from the >> > > default 7 and 2 and see if it helps. >> > > >> > > If this still doesn't help, see if you can set up ganglia to get a >> > > better insight into what is bottlenecking. >> > > --Suraj >> > > >> > > >> > > >> > > On Thu, Oct 11, 2012 at 11:47 PM, Pankaj Misra >> > > wrote: >> > > > OK, Looks like I missed out reading that part in your original mai= l. >> > Did >> > > you try some of the compaction tweaks and configurations as explaine= d >> in >> > > the following link for your data? >> > > > http://hbase.apache.org/book/regions.arch.html#compaction >> > > > >> > > > >> > > > Also, how much data are your putting into the regions, and how big= is >> > > one region at the end of data ingestion? >> > > > >> > > > Thanks and Regards >> > > > Pankaj Misra >> > > > >> > > > -----Original Message----- >> > > > From: Jonathan Bishop [mailto:jbishop.rwc@gmail.com] >> > > > Sent: Friday, October 12, 2012 12:04 PM >> > > > To: user@hbase.apache.org >> > > > Subject: RE: more regionservers does not improve performance >> > > > >> > > > Pankaj, >> > > > >> > > > Thanks for the reply. >> > > > >> > > > Actually, I am using MD5 hashing to evenly spread the keys among t= he >> > > splits, so I don=92t believe there is any hotspot. In fact, when I >> monitory >> > > the web UI for HBase I see a very even load on all the regionservers= . >> > > > >> > > > Jon >> > > > >> > > > Sent from my Windows 8 PC < >> > http://windows.microsoft.com/consumer-preview >> > > > >> > > > >> > > > *From:* Pankaj Misra >> > > > *Sent:* Thursday, October 11, 2012 8:24:32 PM >> > > > *To:* user@hbase.apache.org >> > > > *Subject:* RE: more regionservers does not improve performance >> > > > >> > > > Hi Jonathan, >> > > > >> > > > What seems to me is that, while doing the split across all 40 >> mappers, >> > > the keys are not randomized enough to leverage multiple regions and = the >> > > pre-split strategy. This may be happening because all the 40 mappers >> may >> > be >> > > trying to write onto a single region for sometime, making it a HOT >> > region, >> > > till the key falls into another region, and then the other region >> > becomes >> > > a HOT region hence you may seeing a high impact of compaction cycles >> > > reducing your throughput. >> > > > >> > > > Are the keys incremental? Are the keys randomized enough across th= e >> > > splits? >> > > > >> > > > Ideally when all 40 mappers are running you should see all the >> regions >> > > being filled up in parallel for maximum throughput. Hope it helps. >> > > > >> > > > Thanks and Regards >> > > > Pankaj Misra >> > > > >> > > > >> > > > ________________________________________ >> > > > From: Jonathan Bishop [jbishop.rwc@gmail.com] >> > > > Sent: Friday, October 12, 2012 5:38 AM >> > > > To: user@hbase.apache.org >> > > > Subject: more regionservers does not improve performance >> > > > >> > > > Hi, >> > > > >> > > > I am running a MR job with 40 simultaneous mappers, each of which >> does >> > > puts to HBase. I have ganged up the puts into groups of 1000 (this >> seems >> > to >> > > help quite a bit) and also made sure that the table is pre-split int= o >> 100 >> > > regions, and that the row keys are randomized using MD5 hashing. >> > > > >> > > > My cluster size is 10, and I am allowing 4 mappers per tasktracker= . >> > > > >> > > > In my MR job I know that the mappers are able to generate puts muc= h >> > > faster than the puts can be handled in hbase. In other words if I le= t >> the >> > > mappers run without doing hbase puts then everything scales as you >> would >> > > expect with the number of mappers created. It is the hbase puts whic= h >> > seem >> > > to be the bottleneck. >> > > > >> > > > What is strange is that I do not get much run time improvement by >> > > increasing the number regionservers beyond about 4. Indeed, it seems >> that >> > > the system runs slower with 8 regionservers than with 4. >> > > > >> > > > I have added the following in hbase-env.sh hoping this would help.= .. >> > > (from the book HBase in Action) >> > > > >> > > > export HBASE_OPTS=3D"-Xmx8g" >> > > > export HBASE_REGIONSERVER_OPTS=3D"-Xmx8g -Xms8g -Xmn128m >> -XX:+UseParNewGC >> > > -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=3D70" >> > > > >> > > > # Uncomment below to enable java garbage collection logging in the >> .out >> > > file. >> > > > export HBASE_OPTS=3D"${HBASE_OPTS} -verbose:gc -XX:+PrintGCDetails >> > > -XX:+PrintGCDateStamps -Xloggc:${HBASE_HOME}/logs/gc-hbase.log" >> > > > >> > > > Monitoring hbase through the web ui I see that there are pauses fo= r >> > > flushing, which seems to run pretty quickly, and for compacting, whi= ch >> > > seems to take somewhat longer. >> > > > >> > > > Any advice for making this run faster would be greatly appreciated= . >> > > > Currently I am looking into installing Ganglia to better monitory = my >> > > cluster, but yet to have that running. >> > > > >> > > > I suspect an I/O issue as the regionservers do not seem terribly >> > loaded. >> > > > >> > > > Thanks, >> > > > >> > > > Jon >> > > > >> > > > ________________________________ >> > > > >> > > > Impetus Ranked in the Top 50 India=92s Best Companies to Work For = 2012. >> > > > >> > > > Impetus webcast =91Designing a Test Automation Framework for >> Multi-vendor >> > > Interoperable Systems=92 available at http://lf1.me/0E/. >> > > > >> > > > >> > > > NOTE: This message may contain information that is confidential, >> > > proprietary, privileged or otherwise protected by law. The message i= s >> > > intended solely for the named addressee. If received in error, pleas= e >> > > destroy and notify the sender. Any use of this email is prohibited w= hen >> > > received in error. Impetus does not represent, warrant and/or >> guarantee, >> > > that the integrity of this communication has been maintained nor tha= t >> the >> > > communication is free of errors, virus, interception or interference= . >> > > > >> > > > ________________________________ >> > > > >> > > > Impetus Ranked in the Top 50 India=92s Best Companies to Work For = 2012. >> > > > >> > > > Impetus webcast =91Designing a Test Automation Framework for >> Multi-vendor >> > > Interoperable Systems=92 available at http://lf1.me/0E/. >> > > > >> > > > >> > > > NOTE: This message may contain information that is confidential, >> > > proprietary, privileged or otherwise protected by law. The message i= s >> > > intended solely for the named addressee. If received in error, pleas= e >> > > destroy and notify the sender. Any use of this email is prohibited w= hen >> > > received in error. Impetus does not represent, warrant and/or >> guarantee, >> > > that the integrity of this communication has been maintained nor tha= t >> the >> > > communication is free of errors, virus, interception or interference= . >> > > >> > >> > >> > >> > -- >> > Kevin O'Dell >> > Customer Operations Engineer, Cloudera >> > >>