Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 54432 invoked from network); 14 Jan 2011 17:54:17 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 14 Jan 2011 17:54:17 -0000 Received: (qmail 26997 invoked by uid 500); 14 Jan 2011 17:54:16 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 26762 invoked by uid 500); 14 Jan 2011 17:54:14 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 26754 invoked by uid 99); 14 Jan 2011 17:54:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Jan 2011 17:54:13 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.161.169] (HELO mail-gx0-f169.google.com) (209.85.161.169) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Jan 2011 17:54:07 +0000 Received: by gxk5 with SMTP id 5so1332447gxk.14 for ; Fri, 14 Jan 2011 09:53:45 -0800 (PST) MIME-Version: 1.0 Received: by 10.90.54.6 with SMTP id c6mr1481753aga.83.1295027625706; Fri, 14 Jan 2011 09:53:45 -0800 (PST) Sender: cft@tarnas.org Received: by 10.90.80.7 with HTTP; Fri, 14 Jan 2011 09:53:45 -0800 (PST) In-Reply-To: <5A76F6CE309AD049AAF9A039A39242820F661085@sc-mbx04.TheFacebook.com> References: <8B1DEDD7-0D89-440B-987E-FDFA710EBD5C@gmail.com> <5A76F6CE309AD049AAF9A039A39242820F661085@sc-mbx04.TheFacebook.com> Date: Fri, 14 Jan 2011 11:53:45 -0600 X-Google-Sender-Auth: vg3RvRqi8SPDWGFl8j8P4Tyv07g Message-ID: Subject: Re: Cluster Wide Pauses From: Christopher Tarnas To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=0016361643459acc670499d21d0b X-Virus-Checked: Checked by ClamAV on apache.org --0016361643459acc670499d21d0b Content-Type: text/plain; charset=ISO-8859-1 Thanks - I was not sure and had not received a response from the list on my related question earlier this week. It does seem like compactions are related to my problem, and if I understand correctly does raising hbase.hregion.memstore.block.multiplier give it more of a buffer for that before writes are blocked while compactions happen? I'm writing via thrift (about 30 clients) to a 5 node cluster when I see this problem. There is no io wait so I don't think it is disk bound, and it is not CPU starved. I'm waiting on IT to get me access to ganglia for the network info. -chris On Fri, Jan 14, 2011 at 11:29 AM, Jonathan Gray wrote: > These are a different kind of pause (those caused by blockingStoreFiles). > > This is HBase stepping in and actually blocking updates to a region because > compactions have not been able to keep up with the write load. It could > manifest itself in the same way but this is different than shorter pauses > caused by periodic offlining of regions during balancing and splits. > > Wayne, have you confirmed in your RegionServer logs that the pauses are > associated with splits or region movement, and that you are not seeing the > blocking store files issue? > > JG > > > -----Original Message----- > > From: cft@tarnas.org [mailto:cft@tarnas.org] On Behalf Of Christopher > > Tarnas > > Sent: Friday, January 14, 2011 7:29 AM > > To: user@hbase.apache.org > > Subject: Re: Cluster Wide Pauses > > > > I have been seeing similar problems and found by raising the > > hbase.hregion.memstore.block.multiplier > > to above 12 (default is two) and the hbase.hstore.blockingStoreFiles to > 16 I > > managed to reduce the frequency of the pauses during loads. My nodes are > > pretty beefy (48 GB of ram) so I had room to experiment. > > > > From what I understand that gave the regionservers more buffer before > > they had to halt the world to catch up. The pauses still happen but their > > impact is less now. > > > > -chris > > > > On Fri, Jan 14, 2011 at 8:34 AM, Wayne wrote: > > > > > We have not found any smoking gun here. Most likely these are region > > > splits on a quickly growing/hot region that all clients get caught > waiting for. > > > > > > > > > On Thu, Jan 13, 2011 at 7:49 AM, Wayne wrote: > > > > > > > Thank you for the lead! We will definitely look closer at the OS > logs. > > > > > > > > > > > > On Thu, Jan 13, 2011 at 6:59 AM, Tatsuya Kawano > > > > > > >wrote: > > > > > > > >> > > > >> Hi Wayne, > > > >> > > > >> > We are seeing some TCP Resets on all nodes at the same time, and > > > >> sometimes > > > >> > quite a lot of them. > > > >> > > > >> > > > >> Have you checked this article from Andrei and Cosmin? They had a > > > >> busy firewall to cause network blackout. > > > >> > > > >> http://hstack.org/hbase-performance-testing/ > > > >> > > > >> Maybe it's not your case but just for sure. > > > >> > > > >> Thanks, > > > >> > > > >> -- > > > >> Tatsuya Kawano (Mr.) > > > >> Tokyo, Japan > > > >> > > > >> > > > >> On Jan 13, 2011, at 4:52 AM, Wayne wrote: > > > >> > > > >> > We are seeing some TCP Resets on all nodes at the same time, and > > > >> sometimes > > > >> > quite a lot of them. We have yet to correlate the pauses to the > > > >> > TCP > > > >> resets > > > >> > but I am starting to wonder if this is partly a network problem. > > > >> > Does Gigabit Ethernet break down on high volume nodes? Do high > > > >> > volume nodes > > > >> use > > > >> > 10G or Infiniband? > > > >> > > > > >> > > > > >> > On Wed, Jan 12, 2011 at 1:52 PM, Stack wrote: > > > >> > > > > >> >> Jon asks that you describe your loading in the issue. Would you > > > >> >> mind doing so. Ted, stick up in the issue the workload and > > > >> >> configs. you are running if you don't mind. I'd like to try it > over here. > > > >> >> Thanks lads, > > > >> >> St.Ack > > > >> >> > > > >> >> > > > >> >> On Wed, Jan 12, 2011 at 9:03 AM, Wayne > > wrote: > > > >> >>> Added: https://issues.apache.org/jira/browse/HBASE-3438. > > > >> >>> > > > >> >>> On Wed, Jan 12, 2011 at 11:40 AM, Wayne > > wrote: > > > >> >>> > > > >> >>>> We are using 0.89.20100924, r1001068 > > > >> >>>> > > > >> >>>> We are seeing see it during heavy write load (which is all the > > > time), > > > >> >> but > > > >> >>>> yesterday we had read load as well as write load and saw both > > > >> >>>> reads > > > >> and > > > >> >>>> writes stop for 10+ seconds. The region size is the biggest > > > >> >>>> clue we > > > >> have > > > >> >>>> found from our tests as setting up a new cluster with a 1GB > > > >> >>>> max > > > >> region > > > >> >> size > > > >> >>>> and starting to load heavily we will see this a lot for long > > > >> >>>> long > > > >> time > > > >> >>>> frames. Maybe the bigger file gets hung up more easily with a > > > split? > > > >> >> Your > > > >> >>>> description below also fits in that early on the load is not > > > balanced > > > >> so > > > >> >> it > > > >> >>>> is easier to stop everything on one node as the balance is not > > > great > > > >> >> early > > > >> >>>> on. I will file a JIRA. I will also try to dig deeper into the > > > >> >>>> logs > > > >> >> during > > > >> >>>> the pauses to find a node that might be stuck in a split. > > > >> >>>> > > > >> >>>> > > > >> >>>> > > > >> >>>> On Wed, Jan 12, 2011 at 11:17 AM, Stack > > wrote: > > > >> >>>> > > > >> >>>>> On Tue, Jan 11, 2011 at 2:34 PM, Wayne > > wrote: > > > >> >>>>>> We have very frequent cluster wide pauses that stop all > > > >> >>>>>> reads and > > > >> >>>>> writes > > > >> >>>>>> for seconds. > > > >> >>>>> > > > >> >>>>> All reads and all writes? > > > >> >>>>> > > > >> >>>>> I've seen the pause too for writes. Its something I've > > > >> >>>>> always > > > meant > > > >> >>>>> to look into. Friso postulates one cause. Another that > > > >> >>>>> we've > > > >> talked > > > >> >>>>> of is a region taking a while to come back on line after a > > > >> >>>>> split > > > or > > > >> a > > > >> >>>>> rebalance for whatever reason. Client loading might be > 'random' > > > >> >>>>> spraying over lots of random regions but they all get stuck > > > waiting > > > >> on > > > >> >>>>> one particular region to come back online. > > > >> >>>>> > > > >> >>>>> I suppose reads could be blocked for same reason if all are > > > >> >>>>> trying > > > >> to > > > >> >>>>> read from the offlined region. > > > >> >>>>> > > > >> >>>>> What version of hbase are you using? Splits should be faster > > > >> >>>>> in > > > >> 0.90 > > > >> >>>>> now that the split daughters come up on the same region. > > > >> >>>>> > > > >> >>>>> Sorry I don't have a better answer for you. Need to dig in. > > > >> >>>>> > > > >> >>>>> File a JIRA. If you want to help out some, stick some data > > > >> >>>>> up in > > > >> it. > > > >> >>>>> Some suggestions would be to enable logging of when we > > lookup > > > region > > > >> >>>>> locations in client and then note when requests go to zero. > > > >> >>>>> Can > > > you > > > >> >>>>> figure what region the clients are waiting on (if they are > > > >> >>>>> waiting > > > >> on > > > >> >>>>> any). If you can pull out a particular one, try and elicit > > > >> >>>>> its history at time of blockage. Is it being moved or > > > >> >>>>> mid-split? I suppose it makes sense that bigger regions > > > >> >>>>> would make the > > > situation > > > >> >>>>> 'worse'. I can take a look at it too. > > > >> >>>>> > > > >> >>>>> St.Ack > > > >> >>>>> > > > >> >>>>> > > > >> >>>>> > > > >> >>>>> > > > >> >>>>> We are constantly loading data to this cluster of 10 nodes. > > > >> >>>>>> These pauses can happen as frequently as every minute but > > > sometimes > > > >> >> are > > > >> >>>>> not > > > >> >>>>>> seen for 15+ minutes. Basically watching the Region server > > > >> >>>>>> list > > > >> with > > > >> >>>>> request > > > >> >>>>>> counts is the only evidence of what is going on. All reads > > > >> >>>>>> and > > > >> writes > > > >> >>>>>> totally stop and if there is ever any activity it is on the > > > >> >>>>>> node > > > >> >> hosting > > > >> >>>>> the > > > >> >>>>>> .META. table with a request count of region count + 1. This > > > problem > > > >> >>>>> seems to > > > >> >>>>>> be worse with a larger region size. We tried a 1GB region > > > >> >>>>>> size > > > and > > > >> >> saw > > > >> >>>>> this > > > >> >>>>>> more than we saw actual activity (and stopped using a larger > > > region > > > >> >> size > > > >> >>>>>> because of it). We went back to the default region size and > > > >> >>>>>> it > > > was > > > >> >>>>> better, > > > >> >>>>>> but we had too many regions so now we are up to 512M for a > > > >> >>>>>> region > > > >> >> size > > > >> >>>>> and > > > >> >>>>>> we are seeing it more again. > > > >> >>>>>> > > > >> >>>>>> Does anyone know what this is? We have dug into all of the > > > >> >>>>>> logs > > > to > > > >> >> find > > > >> >>>>> some > > > >> >>>>>> sort of pause but are not able to find anything. Is this an > > > >> >>>>>> wal > > > >> hlog > > > >> >>>>> roll? > > > >> >>>>>> Is this a region split or compaction? Of course our biggest > > > >> >>>>>> fear > > > is > > > >> a > > > >> >> GC > > > >> >>>>>> pause on the master but we do not have java logging turned > > > >> >>>>>> on > > > with > > > >> >> the > > > >> >>>>>> master to tell. What could possibly stop the entire cluster > > > >> >>>>>> from > > > >> >> working > > > >> >>>>> for > > > >> >>>>>> seconds at a time very frequently? > > > >> >>>>>> > > > >> >>>>>> Thanks in advance for any ideas of what could be causing > this. > > > >> >>>>>> > > > >> >>>>> > > > >> >>>> > > > >> >>>> > > > >> >>> > > > >> >> > > > >> > > > >> > > > > > > > > --0016361643459acc670499d21d0b--