Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
MIME-Version: 1.0
Sender: cft@tarnas.org
In-Reply-To: 
 <5A76F6CE309AD049AAF9A039A39242820F661085@sc-mbx04.TheFacebook.com>
References: <AANLkTi=1ykLoLzmNoggWCfEGZ=rAGi6Dh=APwsBv24kn@mail.gmail.com>
	<AANLkTiky8uuPDG5rP+Z6JwnJ8kPJNKMMZXD52qDfpWFU@mail.gmail.com>
	<AANLkTineOs3MGYteJMUQMUGAfbj-nfFyFAWn3hUNDxU2@mail.gmail.com>
	<AANLkTimzWoXWrnP0beFXyc-c_NbBVu2JCyKL0rvngdAn@mail.gmail.com>
	<AANLkTikdYQgdJ15U0Et7kA8NWcaDVuCz5WEp9ny=BooP@mail.gmail.com>
	<AANLkTincntmMyWzvD9M-9eTPKCa+9V8bXi5BdmOipGyr@mail.gmail.com>
	<8B1DEDD7-0D89-440B-987E-FDFA710EBD5C@gmail.com>
	<AANLkTimf=S3ovEorCnD923cTwPUhb2FO=ugg3XtG086-@mail.gmail.com>
	<AANLkTikogrbJv63fDkE6BoEGE6mAkJfbOnEwWsm-30Jd@mail.gmail.com>
	<AANLkTi=hkYrJqZK25CM7spXnw72nActwYUTxCSGJe+se@mail.gmail.com>
	<5A76F6CE309AD049AAF9A039A39242820F661085@sc-mbx04.TheFacebook.com>
Date: Fri, 14 Jan 2011 11:53:45 -0600
Message-ID: <AANLkTinMkzjHqyw3c0B9cVD9sr2_ruLQ1_nj7EBEYyoJ@mail.gmail.com>
Subject: Re: Cluster Wide Pauses
From: Christopher Tarnas <cft@email.com>
To: user@hbase.apache.org
Content-Type: multipart/alternative; boundary=0016361643459acc670499d21d0b

--0016361643459acc670499d21d0b
Content-Type: text/plain; charset=ISO-8859-1

Thanks - I was not sure and had not received a response from the list on my
related question earlier this week.

It does seem like compactions are related to my problem, and if I understand
correctly does raising hbase.hregion.memstore.block.multiplier give it more
of a buffer for that before writes are blocked while compactions happen? I'm
writing via thrift (about 30 clients) to a 5 node cluster when I see this
problem. There is no io wait so I don't think it is disk bound, and it is
not CPU starved. I'm waiting on IT to get me access to ganglia for the
network info.

-chris

On Fri, Jan 14, 2011 at 11:29 AM, Jonathan Gray <jgray@fb.com> wrote:

> These are a different kind of pause (those caused by blockingStoreFiles).
>
> This is HBase stepping in and actually blocking updates to a region because
> compactions have not been able to keep up with the write load.  It could
> manifest itself in the same way but this is different than shorter pauses
> caused by periodic offlining of regions during balancing and splits.
>
> Wayne, have you confirmed in your RegionServer logs that the pauses are
> associated with splits or region movement, and that you are not seeing the
> blocking store files issue?
>
> JG
>
> > -----Original Message-----
> > From: cft@tarnas.org [mailto:cft@tarnas.org] On Behalf Of Christopher
> > Tarnas
> > Sent: Friday, January 14, 2011 7:29 AM
> > To: user@hbase.apache.org
> > Subject: Re: Cluster Wide Pauses
> >
> > I have been seeing similar problems and found by raising the
> > hbase.hregion.memstore.block.multiplier
> > to above 12 (default is two) and the hbase.hstore.blockingStoreFiles to
> 16 I
> > managed to reduce the frequency of the pauses during loads.  My nodes are
> > pretty beefy (48 GB of ram) so I had room to experiment.
> >
> > From what I understand that gave the regionservers more buffer before
> > they had to halt the world to catch up. The pauses still happen but their
> > impact is less now.
> >
> > -chris
> >
> > On Fri, Jan 14, 2011 at 8:34 AM, Wayne <wav100@gmail.com> wrote:
> >
> > > We have not found any smoking gun here. Most likely these are region
> > > splits on a quickly growing/hot region that all clients get caught
> waiting for.
> > >
> > >
> > > On Thu, Jan 13, 2011 at 7:49 AM, Wayne <wav100@gmail.com> wrote:
> > >
> > > > Thank you for the lead! We will definitely look closer at the OS
> logs.
> > > >
> > > >
> > > > On Thu, Jan 13, 2011 at 6:59 AM, Tatsuya Kawano
> > > ><tatsuya6502@gmail.com
> > > >wrote:
> > > >
> > > >>
> > > >> Hi Wayne,
> > > >>
> > > >> > We are seeing some TCP Resets on all nodes at the same time, and
> > > >> sometimes
> > > >> > quite a lot of them.
> > > >>
> > > >>
> > > >> Have you checked this article from Andrei and Cosmin? They had a
> > > >> busy firewall to cause network blackout.
> > > >>
> > > >> http://hstack.org/hbase-performance-testing/
> > > >>
> > > >> Maybe it's not your case but just for sure.
> > > >>
> > > >> Thanks,
> > > >>
> > > >> --
> > > >> Tatsuya Kawano (Mr.)
> > > >> Tokyo, Japan
> > > >>
> > > >>
> > > >> On Jan 13, 2011, at 4:52 AM, Wayne <wav100@gmail.com> wrote:
> > > >>
> > > >> > We are seeing some TCP Resets on all nodes at the same time, and
> > > >> sometimes
> > > >> > quite a lot of them. We have yet to correlate the pauses to the
> > > >> > TCP
> > > >> resets
> > > >> > but I am starting to wonder if this is partly a network problem.
> > > >> > Does Gigabit Ethernet break down on high volume nodes? Do high
> > > >> > volume nodes
> > > >> use
> > > >> > 10G or Infiniband?
> > > >> >
> > > >> >
> > > >> > On Wed, Jan 12, 2011 at 1:52 PM, Stack <stack@duboce.net> wrote:
> > > >> >
> > > >> >> Jon asks that you describe your loading in the issue.  Would you
> > > >> >> mind doing so.  Ted, stick up in the issue the workload and
> > > >> >> configs. you are running if you don't mind.  I'd like to try it
> over here.
> > > >> >> Thanks lads,
> > > >> >> St.Ack
> > > >> >>
> > > >> >>
> > > >> >> On Wed, Jan 12, 2011 at 9:03 AM, Wayne <wav100@gmail.com>
> > wrote:
> > > >> >>> Added: https://issues.apache.org/jira/browse/HBASE-3438.
> > > >> >>>
> > > >> >>> On Wed, Jan 12, 2011 at 11:40 AM, Wayne <wav100@gmail.com>
> > wrote:
> > > >> >>>
> > > >> >>>> We are using 0.89.20100924, r1001068
> > > >> >>>>
> > > >> >>>> We are seeing see it during heavy write load (which is all the
> > > time),
> > > >> >> but
> > > >> >>>> yesterday we had read load as well as write load and saw both
> > > >> >>>> reads
> > > >> and
> > > >> >>>> writes stop for 10+ seconds. The region size is the biggest
> > > >> >>>> clue we
> > > >> have
> > > >> >>>> found from our tests as setting up a new cluster with a 1GB
> > > >> >>>> max
> > > >> region
> > > >> >> size
> > > >> >>>> and starting to load heavily we will see this a lot for long
> > > >> >>>> long
> > > >> time
> > > >> >>>> frames. Maybe the bigger file gets hung up more easily with a
> > > split?
> > > >> >> Your
> > > >> >>>> description below also fits in that early on the load is not
> > > balanced
> > > >> so
> > > >> >> it
> > > >> >>>> is easier to stop everything on one node as the balance is not
> > > great
> > > >> >> early
> > > >> >>>> on. I will file a JIRA. I will also try to dig deeper into the
> > > >> >>>> logs
> > > >> >> during
> > > >> >>>> the pauses to find a node that might be stuck in a split.
> > > >> >>>>
> > > >> >>>>
> > > >> >>>>
> > > >> >>>> On Wed, Jan 12, 2011 at 11:17 AM, Stack <stack@duboce.net>
> > wrote:
> > > >> >>>>
> > > >> >>>>> On Tue, Jan 11, 2011 at 2:34 PM, Wayne <wav100@gmail.com>
> > wrote:
> > > >> >>>>>> We have very frequent cluster wide pauses that stop all
> > > >> >>>>>> reads and
> > > >> >>>>> writes
> > > >> >>>>>> for seconds.
> > > >> >>>>>
> > > >> >>>>> All reads and all writes?
> > > >> >>>>>
> > > >> >>>>> I've seen the pause too for writes.  Its something I've
> > > >> >>>>> always
> > > meant
> > > >> >>>>> to look into.  Friso postulates one cause.  Another that
> > > >> >>>>> we've
> > > >> talked
> > > >> >>>>> of is a region taking a while to come back on line after a
> > > >> >>>>> split
> > > or
> > > >> a
> > > >> >>>>> rebalance for whatever reason.  Client loading might be
> 'random'
> > > >> >>>>> spraying over lots of random regions but they all get stuck
> > > waiting
> > > >> on
> > > >> >>>>> one particular region to come back online.
> > > >> >>>>>
> > > >> >>>>> I suppose reads could be blocked for same reason if all are
> > > >> >>>>> trying
> > > >> to
> > > >> >>>>> read from the offlined region.
> > > >> >>>>>
> > > >> >>>>> What version of hbase are you using?  Splits should be faster
> > > >> >>>>> in
> > > >> 0.90
> > > >> >>>>> now that the split daughters come up on the same region.
> > > >> >>>>>
> > > >> >>>>> Sorry I don't have a better answer for you.  Need to dig in.
> > > >> >>>>>
> > > >> >>>>> File a JIRA.  If you want to help out some, stick some data
> > > >> >>>>> up in
> > > >> it.
> > > >> >>>>> Some suggestions would be to enable logging of when we
> > lookup
> > > region
> > > >> >>>>> locations in client and then note when requests go to zero.
> > > >> >>>>> Can
> > > you
> > > >> >>>>> figure what region the clients are waiting on (if they are
> > > >> >>>>> waiting
> > > >> on
> > > >> >>>>> any).  If you can pull out a particular one, try and elicit
> > > >> >>>>> its history at time of blockage.  Is it being moved or
> > > >> >>>>> mid-split?  I suppose it makes sense that bigger regions
> > > >> >>>>> would make the
> > > situation
> > > >> >>>>> 'worse'.  I can take a look at it too.
> > > >> >>>>>
> > > >> >>>>> St.Ack
> > > >> >>>>>
> > > >> >>>>>
> > > >> >>>>>
> > > >> >>>>>
> > > >> >>>>> We are constantly loading data to this cluster of 10 nodes.
> > > >> >>>>>> These pauses can happen as frequently as every minute but
> > > sometimes
> > > >> >> are
> > > >> >>>>> not
> > > >> >>>>>> seen for 15+ minutes. Basically watching the Region server
> > > >> >>>>>> list
> > > >> with
> > > >> >>>>> request
> > > >> >>>>>> counts is the only evidence of what is going on. All reads
> > > >> >>>>>> and
> > > >> writes
> > > >> >>>>>> totally stop and if there is ever any activity it is on the
> > > >> >>>>>> node
> > > >> >> hosting
> > > >> >>>>> the
> > > >> >>>>>> .META. table with a request count of region count + 1. This
> > > problem
> > > >> >>>>> seems to
> > > >> >>>>>> be worse with a larger region size. We tried a 1GB region
> > > >> >>>>>> size
> > > and
> > > >> >> saw
> > > >> >>>>> this
> > > >> >>>>>> more than we saw actual activity (and stopped using a larger
> > > region
> > > >> >> size
> > > >> >>>>>> because of it). We went back to the default region size and
> > > >> >>>>>> it
> > > was
> > > >> >>>>> better,
> > > >> >>>>>> but we had too many regions so now we are up to 512M for a
> > > >> >>>>>> region
> > > >> >> size
> > > >> >>>>> and
> > > >> >>>>>> we are seeing it more again.
> > > >> >>>>>>
> > > >> >>>>>> Does anyone know what this is? We have dug into all of the
> > > >> >>>>>> logs
> > > to
> > > >> >> find
> > > >> >>>>> some
> > > >> >>>>>> sort of pause but are not able to find anything. Is this an
> > > >> >>>>>> wal
> > > >> hlog
> > > >> >>>>> roll?
> > > >> >>>>>> Is this a region split or compaction? Of course our biggest
> > > >> >>>>>> fear
> > > is
> > > >> a
> > > >> >> GC
> > > >> >>>>>> pause on the master but we do not have java logging turned
> > > >> >>>>>> on
> > > with
> > > >> >> the
> > > >> >>>>>> master to tell. What could possibly stop the entire cluster
> > > >> >>>>>> from
> > > >> >> working
> > > >> >>>>> for
> > > >> >>>>>> seconds at a time very frequently?
> > > >> >>>>>>
> > > >> >>>>>> Thanks in advance for any ideas of what could be causing
> this.
> > > >> >>>>>>
> > > >> >>>>>
> > > >> >>>>
> > > >> >>>>
> > > >> >>>
> > > >> >>
> > > >>
> > > >>
> > > >
> > >
>

--0016361643459acc670499d21d0b--