hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: Hbase pausing problems
Date Wed, 20 Jan 2010 19:26:00 GMT
Looking at logs, what J-D says regards the number of regions you are
carrying per server (800).  Enable compression and that'll shrink the number
and probably up your throughput all around or make your regions larger (make
sure to up the memstore size in sympathy).

Your keys look like UUIDs so are probably pretty well spread over the key
space would be my guess -- that you are not beating up on one region
continuously (the J-D painted first scenario is probably whats happening).

I thought I could see what your schema was by looking in logs but thats no
longer the case so please tell us more about it -- number of column
families, what one of your 6M inserts is comprised of.

Would suggest you not run hbase as root if you can avoid it.

I'll leave it at this for now.   Chatting with J-D about this issue, given
that you are using UUIDs so load is being spread nice an even across your
cluster, you should try out his suggested 0.5/0.48 settings on 3G or RAM.

St.Ack

On Wed, Jan 20, 2010 at 10:55 AM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> A table is sorted by row key and all the regions are sequentially
> split so that a row will always go to a single region and if that
> region is unavailable for some reason then you can't write
> immediately. If your write pattern is distributed among the regions,
> they will all slowly synchronize on the hung region. This is probably
> why the writes stops.
>
> Or, if you are always writing row keys sequentially then it's even
> _worse_ because all writes will always go to the same region so
> there's no load distribution at all. Example: incrementing row key.
>
> You also seem to have a lot of regions per region server which plays
> in the global memstore size.
>
> Finally, I recommend setting:
>
> heap to 3G
> hbase.regionserver.global.memstore.upperLimit to 0.5
> hbase.regionserver.global.memstore.lowerLimit to 0.48
>
> This should help a lot.
>
> Thx,
>
> J-D
>
> On Wed, Jan 20, 2010 at 9:37 AM, Seraph Imalia <seraph@eisp.co.za> wrote:
> >
> >
> >
> >> From: stack <stack@duboce.net>
> >> Reply-To: <hbase-user@hadoop.apache.org>
> >> Date: Wed, 20 Jan 2010 07:26:58 -0800
> >> To: <hbase-user@hadoop.apache.org>
> >> Subject: Re: Hbase pausing problems
> >>
> >> On Wed, Jan 20, 2010 at 1:06 AM, Seraph Imalia <seraph@eisp.co.za>
> wrote:
> >>
> >>>
> >>> The client stops being able to write to hBase as soon as 1 of the
> >>> regionservers starts doing this...
> >>>
> >>> 2010-01-17 01:16:25,729 INFO
> >>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Forced flushing
> of
> >>> ChannelDelivery,5352f559-d68e-42e9-be92-8bae82185ed1,1262544772804
> because
> >>> global memstore limit of 396.7m exceeded; currently 396.7m and flushing
> >>> till
> >>> 247.9m
> >>>
> >>> See hbase.regionserver.global.memstore.upperLimit and
> >> hbase.regionserver.global.memstore.lowerLimit.  The former is a
> prophylactic
> >> against OOME'ing.  The sum of all memory used by MemStores is not
> allowed to
> >> grow beyond 0.4 of total heap size (0.4 is default).  The 247.9M figure
> in
> >> the above is 0.25 of the heap by default.  Writes are held up until
> >> sufficient MemStore space has been dumped by flushing.  You seem to be
> >> taking on writes at a rate that is in excess of the rate at which you
> can
> >> flush.  We'll take a lok at your logs..... You might up the 0.25 to 0.3
> or
> >> 0.32.  This will shorten the times we stop taking on writes but at the
> cost
> >> of increasing the number of times we disallow writes.
> >
> > Does this mean that when 1 regionserver does a memstore flush, the other
> two
> > regionservers are also unavailable for writes?  I have watched the logs
> > carefully to make sure that not all the regionservers are flushing at the
> > same time.  Most of the time, only 1 server flushes at a time and in rare
> > cases, I have seen two at a time.
> >
> >>
> >> It also looks like you have little RAM space given over to hbase, just
> 1G?
> >> If your traffic is bursty, giving hbase more RAM might help it get over
> >> these write humps.
> >
> > I have it at 1G on purpose.  When we first had the problem, I immediately
> > thought the problem was resource related, so I increased the hBase RAM to
> 3G
> > (each server has 8G - I was carefull to watch for swapping).  This made
> the
> > problem worse because each memstore flush took longer which stopped
> writing
> > for longer and people started noticing that our system was down during
> those
> > periods.  Granted, the period between flushes was longer, but the effect
> was
> > that people started to notice our downtime.  So I have put the RAM back
> down
> > to 1G to minimize the negative effects on the live system and less people
> > notice it.
> >
> >
> >>
> >>
> >>
> >>> Or this...
> >>>
> >>> 2010-01-17 01:16:26,159 INFO
> >>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Forced flushing
> of
> >>> AdDelivery,613a401d-fb8a-42a9-aac6-d957f6281035,1261867806692 because
> >>> global
> >>> memstore limit of 396.7m exceeded; currently 390.4m and flushing till
> >>> 247.9m
> >>>
> >>> This is a by-product of the above hitting 'global limit'.
> >>
> >>
> >>
> >>> And then as soon as it finishes that, it starts doing this...
> >>>
> >>> 2010-01-17 01:16:36,709 DEBUG
> >>> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction
> >>> requested for region
> >>> AdDelivery,fb98f6c9-db13-4853-92ee-ffe1182fffd0,1263544763046/350999600
> >>> because: regionserver/192.168.2.88:60020.cacheFlusher
> >>>
> >>> These are 'normal'  We are logging fact that a compaction has been
> >> requested on a region.  This does not get in the way of our taking on
> writes
> >> (not directly).
> >>
> >>
> >>
> >>> And as soon as it has finished the last of the Compaction Requests, the
> >>> client recovers and the regionserver starts doing this...
> >>>
> >>> 2010-01-17 01:16:36,713 DEBUG
> org.apache.hadoop.hbase.regionserver.Store:
> >>> Compaction size of ChannelDelivery_Family: 209.5m; Skipped 1 file(s),
> size:
> >>> 216906650
> >>> 2010-01-17 01:16:36,713 DEBUG
> org.apache.hadoop.hbase.regionserver.Store:
> >>> Started compaction of 3 file(s)  into
> >>> /hbase/ChannelDelivery/compaction.dir/165262792, seqid=1241653592
> >>> 2010-01-17 01:16:37,143 DEBUG
> org.apache.hadoop.hbase.regionserver.Store:
> >>> Completed compaction of ChannelDelivery_Family; new storefile is
> >>>
> >>>
> hdfs://dynobuntu6:8020/hbase/ChannelDelivery/165262792/ChannelDelivery_Famil
> >>> y/1673693545539520912; store size is 209.5m
> >>>
> >>
> >> Above is 'normal'.  At DEBUG you see detail on hbase going about its
> >> business.
> >>
> >>
> >>>
> >>> All of these logs seem perfectly acceptable to me - the problem is that
> it
> >>> just requires one of the regionservers to start doing this for the
> client
> >>> to
> >>> be prevented from inserting new rows into hBase.  The logs don't seem
> to
> >>> explain why this is happening.
> >>>
> >>>
> >> Clients will be blocked writing regions carried by the effected
> regionserver
> >> only.  Your HW is not appropriate to the load as currently setup.  You
> might
> >> also consider adding more machines to your cluster.
> >>
> >
> > Hmm... How does hBase decide which region to write to?  Is it possible
> that
> > hBase is deciding to write all our current records to one specific region
> > that happens to be on the server that is busy doing a memstore flush?
> >
> > We are currently inserting about 6 million rows per day.  SQL Server
> (which
> > I am so happy to no longer be using for this) was able to write (and
> > replicate to a slave) 9 million records (using the same spec'ed server).
>  I
> > would like to see hBase cope with the 3 we have given it at least when
> > inserting 6 million.  Do you think this is possible or is our only answer
> to
> > throw on more servers?
> >
> > Seraph
> >
> >> St.Ack
> >>
> >>
> >>
> >>> Thank you for your assistance thus far; please let me know if you need
> or
> >>> discover anything else?
> >>>
> >>> Regards,
> >>> Seraph
> >>>
> >>>
> >>>
> >>>> From: Jean-Daniel Cryans <jdcryans@apache.org>
> >>>> Reply-To: <hbase-user@hadoop.apache.org>
> >>>> Date: Mon, 18 Jan 2010 09:49:16 -0800
> >>>> To: <hbase-user@hadoop.apache.org>
> >>>> Subject: Re: Hbase pausing problems
> >>>>
> >>>> The next step would be to take a look at your region server's log
> >>>> around the time of the insert and clients who don't resume after the
> >>>> loss of a region server. If you are able to gzip them and put them on
> >>>> a public server, it would be awesome.
> >>>>
> >>>> Thx,
> >>>>
> >>>> J-D
> >>>>
> >>>> On Mon, Jan 18, 2010 at 1:03 AM, Seraph Imalia <seraph@eisp.co.za>
> >>> wrote:
> >>>>> Answers below...
> >>>>>
> >>>>> Regards,
> >>>>> Seraph
> >>>>>
> >>>>>> From: stack <stack@duboce.net>
> >>>>>> Reply-To: <hbase-user@hadoop.apache.org>
> >>>>>> Date: Fri, 15 Jan 2010 10:10:39 -0800
> >>>>>> To: <hbase-user@hadoop.apache.org>
> >>>>>> Subject: Re: Hbase pausing problems
> >>>>>>
> >>>>>> How many CPUs?
> >>>>>
> >>>>> 1x Quad Xeon in each server
> >>>>>
> >>>>>>
> >>>>>> You are using default JVM settings (see HBASE_OPTS in hbase-env.sh).
> >>>  You
> >>>>>> might want to enable GC logging.  See the line after hbase-env.sh.
> >>>  Enable
> >>>>>> it.  GC logging might tell you about the pauses you are seeing.
> >>>>>
> >>>>> I will enable GC Logging tonight during our slow time because
> restarting
> >>> the
> >>>>> regionservers causes the clients to pause indefinitely.
> >>>>>
> >>>>>>
> >>>>>> Can you get a fourth server for your cluster and run the master,
zk,
> >>> and
> >>>>>> namenode on it and leave the other three servers for regionserver
> and
> >>>>>> datanode (with perhaps replication == 2 as per J-D to lighten
load
> on
> >>> small
> >>>>>> cluster).
> >>>>>
> >>>>> We plan to double the number of servers in the next few weeks and
I
> will
> >>>>> take your advice to put the master, zk and namenode on it (we will
> need
> >>> to
> >>>>> have a second one on standby should this one crash).  The servers
> will
> >>> be
> >>>>> ordered shortly and will be here in a week or two.
> >>>>>
> >>>>> That said, I have been monitoring CPU usage and none of them seem
> >>>>> particularly busy.  The regionserver on each one hovers around 30%
> all
> >>> the
> >>>>> time and the datanode sits at about 10% most of the time.  If we
do
> have
> >>> a
> >>>>> resource issue, it definitely does not seem to be CPU.
> >>>>>
> >>>>> Increasing RAM did not seem to work either - it just made hBase
use a
> >>> bigger
> >>>>> memstore and then it took longer to do a flush.
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> More notes inline in below.
> >>>>>>
> >>>>>> On Fri, Jan 15, 2010 at 1:33 AM, Seraph Imalia <seraph@eisp.co.za>
> >>> wrote:
> >>>>>>
> >>>>>>> Approximately every 10 minutes, our entire coldfusion system
pauses
> at
> >>> the
> >>>>>>> point of inserting into hBase for between 30 and 60 seconds
and
> then
> >>>>>>> continues.
> >>>>>>>
> >>>>>>> Yeah, enable GC logging.  See if you can make correlation
between
> the
> >>> pause
> >>>>>> the client is seeing and a GC pause.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> Investigation...
> >>>>>>>
> >>>>>>> Watching the logs of the regionserver, the pausing of the
> coldfusion
> >>> system
> >>>>>>> happens as soon as one of the regionservers starts flushing
the
> >>> memstore
> >>>>>>> and
> >>>>>>> recovers again as soon as it is finished flushing (recovers
as soon
> as
> >>> it
> >>>>>>> starts compacting).
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> ...though, this would seem to point to an issue with your hardware.
> >>>  How
> >>>>>> many disks?  Are they misconfigured such that they hold up the
> system
> >>> when
> >>>>>> they are being heavily written to?
> >>>>>>
> >>>>>>
> >>>>>> A regionserver log at DEBUG from around this time so we could
look
> at
> >>> it
> >>>>>> would be helpful.
> >>>>>>
> >>>>>>
> >>>>>> I can recreate the error just by stopping 1 of the regionservers;
> but
> >>> then
> >>>>>>> starting the regionserver again does not make coldfusion
recover
> until
> >>> I
> >>>>>>> restart the coldfusion servers.  It is important to note
that if I
> >>> keep the
> >>>>>>> built in hBase shell running, it is happily able to put
and get
> data
> >>> to and
> >>>>>>> from hBase whilst coldfusion is busy pausing/failing.
> >>>>>>>
> >>>>>>
> >>>>>> This seems odd.  Enable DEBUG for the client-side.  Do you see
the
> >>> shell
> >>>>>> recalibrating finding new locations for regions after you shutdown
> the
> >>>>>> single regionserver, something that your coldfusion is not doing?
>  Or,
> >>>>>> maybe, the shell is putting a regionserver that has not been
> disturbed
> >>> by
> >>>>>> your start/stop?
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> I have tried increasing the regionserver¹s RAM to 3 Gigs
and this
> just
> >>> made
> >>>>>>> the problem worse because it took longer for the regionservers
to
> >>> flush the
> >>>>>>> memory store.
> >>>>>>
> >>>>>>
> >>>>>> Again, if flushing is holding up the machine, if you can't write
a
> file
> >>> in
> >>>>>> background without it freezing your machine, then your machines
are
> >>> anemic
> >>>>>> or misconfigured?
> >>>>>>
> >>>>>>
> >>>>>>> One of the links I found on your site mentioned increasing
> >>>>>>> the default value for hbase.regionserver.handler.count to
100 ­
> this
> >>> did
> >>>>>>> not
> >>>>>>> seem to make any difference.
> >>>>>>
> >>>>>>
> >>>>>> Leave this configuration in place I'd say.
> >>>>>>
> >>>>>> Are you seeing 'blocking' messages in the regionserver logs?
> >>>  Regionserver
> >>>>>> will stop taking on writes if it thinks its being overrun to
prevent
> >>> itself
> >>>>>> OOME'ing.  Grep the 'multiplier' configuration in hbase-default.xml.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> I have double checked that the memory flush
> >>>>>>> very rarely happens on more than 1 regionserver at a time
­ in fact
> in
> >>> my
> >>>>>>> many hours of staring at tails of logs, it only happened
once where
> >>> two
> >>>>>>> regionservers flushed at the same time.
> >>>>>>>
> >>>>>>> You've enabled DEBUG?
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> My investigations point strongly towards a coding problem
on our
> side
> >>>>>>> rather
> >>>>>>> than a problem with the server setup or hBase itself.
> >>>>>>
> >>>>>>
> >>>>>> If things were slow from client-perspective, that might be a
> >>> client-side
> >>>>>> coding problem but these pauses, unless you have a fly-by deadlock
> in
> >>> your
> >>>>>> client-code, its probably an hbase issue.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>  I say this because
> >>>>>>> whilst I understand why a regionserver would go offline
during a
> >>> memory
> >>>>>>> flush, I would expect the other two regionservers to pick
up the
> load
> >>> ­
> >>>>>>> especially since the built-in hbase shell has no problem
accessing
> >>> hBase
> >>>>>>> whilst a regionserver is busy doing a memstore flush.
> >>>>>>>
> >>>>>>> HBase does not go offline during memory flush.  It continues
to be
> >>>>>> available for reads and writes during this time.  And see J-D
> response
> >>> for
> >>>>>> incorrect understanding of how loading of regions is done in
an
> hbase
> >>>>>> cluster.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> ...
> >>>>>>
> >>>>>>
> >>>>>> I think either I am leaving out code that is required to determine
> >>> which
> >>>>>>> RegionServers are available OR I am keeping too many hBase
objects
> in
> >>> RAM
> >>>>>>> instead of calling their constructors each time (my purpose
> obviously
> >>> was
> >>>>>>> to
> >>>>>>> improve performance).
> >>>>>>>
> >>>>>>>
> >>>>>> For sure keep single instance of HBaseConfiguration at least
and use
> >>> this
> >>>>>> constructing all HTable and HBaseAdmin instances.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> Currently the live system is inserting over 7 Million records
per
> day
> >>>>>>> (mostly between 8AM and 10PM) which is not a ridiculously
high
> load.
> >>>>>>>
> >>>>>>>
> >>>>>> What size are the records?   What is your table schema?  How
many
> >>> regions do
> >>>>>> you currently have in your table?
> >>>>>>
> >>>>>>  St.Ack
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >
> >
> >
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message