hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Corgan <mcor...@hotpads.com>
Subject Re: more regionservers does not improve performance
Date Mon, 15 Oct 2012 00:48:02 GMT
It could be network bound, especially if you have decently size values
(~500B+).  HBase can be rough on the network because each value travels
from client to regionserver, and then makes 2 additional network hops in
the WAL, and then an additional 2 hops in the memstore flush, plus ongoing
compactions.  After disabling the WAL, enabling GZIP compression on the
table can cut down on the flush/compaction impact if your data is
compressible.

How long are your row keys and values, and how many cells do you have per
row?  Longer keys would point towards internal limitations in hbase
(locking, cpu usage, etc), while longer values indicate network and disk
limitations.

Another consideration is that your workload may be too even and is not
given enough time to find steady state.  If you have 12-25 regions per
server and your workload is perfectly randomized, then all regions will hit
the memstore flush size simultaneously which triggers 12-25 memstore
flushes at the same time.  The memstore flusher may be single threaded (i
forget), so you are suddenly hitting the blocking storefile limit which
could explain the pauses you are seeing.  You could try reducing the number
of regions to ~4/server.  And make sure your memstore flush size is at
least 256M.

Matt


On Sun, Oct 14, 2012 at 8:48 AM, Jonathan Bishop <jbishop.rwc@gmail.com>wrote:

> Matt,
>
> Yes, I did. What I observed is that the map job proceeds about 3-4x faster
> for  a while. But then I observed long pauses partway through the job, and
> overall run time was only reduced only modestly, way from 50 minutes to 40
> minutes.
>
> Just to summarize the issue, my mapper jobs seem to scale nicely. This is
> expected as my dfs block size is small enough to create over 500 tasks, and
> I have a max of 40 mappers running.
>
> But when I include puts to hbase in my job, then I see a 4-6x slowdown
> which does not respond to an increasing number of regionservers.
>
> My current best guess is that there is a network bottleneck in getting the
> puts produced by the mappers to the appropriate regionservers, as I assume
> that once the puts are received by the regionservers that they can all
> operate in parallel without slowing each other down.
>
> Again, I am on grid which is used by many others, and the machines in my
> cluster are not dedicated to my job. I am mainly looking at scalability
> trends when running with various numbers of regionservers.
>
> Jon
>
> On Sat, Oct 13, 2012 at 10:37 PM, Matt Corgan <mcorgan@hotpads.com> wrote:
>
> > Did you try setting put.setWriteToWAL(false) as Bryan suggested?  This
> may
> > not be what you want in the end, but seeing what happens may help debug.
> >
> > Matt
> >
> > On Sat, Oct 13, 2012 at 8:58 AM, Jonathan Bishop <jbishop.rwc@gmail.com
> > >wrote:
> >
> > > Suraj,
> > >
> > > I bumped my regionservers all the way up to 32g from 8g. They are
> running
> > > on 64g and 128g machines on our cluster. Unfortunately, the machines
> all
> > > have various states of loading (usually high) from other users.
> > >
> > > In ganglia I do not see any swapping, but that has been known to happen
> > > from time to time.
> > >
> > > Thanks for your help - I'll take a look at your links.
> > >
> > > Jon
> > >
> > > On Fri, Oct 12, 2012 at 7:30 PM, Suraj Varma <svarma.ng@gmail.com>
> > wrote:
> > >
> > > > Hi Jonathan:
> > > > What specific metric on ganglia did you notice for "IO is spiking"?
> Is
> > > > it your disk IO? Is your disk swapping? Do you see cpu iowait spikes?
> > > >
> > > > I see you have given 8g to the RegionServer ... how much RAM is
> > > > available total on that node? What heap are the individual mappers &
> > > > DN set to run on (i.e. check whether you are overallocated on heap
> > > > when the _mappers_ run ... causing disk swapping ... leading to IO?).
> > > >
> > > > There can be multiple causes ... so, you may need to look at ganglia
> > > > stats and narrow the bottleneck down as described in
> > > > http://hbase.apache.org/book/casestudies.perftroub.html
> > > >
> > > > Here's a good reference for all the memstore related tweaks you can
> > > > try (and also to understand what each configuration means):
> > > >
> > http://blog.sematext.com/2012/07/16/hbase-memstore-what-you-should-know/
> > > >
> > > > Also, provide more details on your schema (CFs, row size), Put sizes,
> > > > etc as well to see if that triggers an idea from the list.
> > > > --S
> > > >
> > > >
> > > > On Fri, Oct 12, 2012 at 12:46 PM, Bryan Beaudreault
> > > > <bbeaudreault@hubspot.com> wrote:
> > > > > I recommend turning on debug logging on your region servers.  You
> may
> > > > need
> > > > > to tune down certain packages back to info, because there are a few
> > > > spammy
> > > > > ones, but overall it helps.
> > > > >
> > > > > You should see messages such as "12/10/09 14:22:57 INFO
> > > > > regionserver.HRegion: Blocking updates for 'IPC Server handler 41
> on
> > > > 60020'
> > > > > on region XXX: memstore size 256.0m is >= than blocking 256.0m
> size".
> > >  As
> > > > > you can see, this is an INFO anyway so you should be able to see
it
> > now
> > > > if
> > > > > it is happening.
> > > > >
> > > > > You can try upping the number of IPC handlers and the memstore
> flush
> > > > > threshold.  Also, maybe you are bottlenecked by the WAL.  Try doing
> > > > > put.setWriteToWAL(false), just to see if it increases performance.
> >  If
> > > so
> > > > > and you want to be a bit more safe with regard to the wal, you can
> > try
> > > > > turning on deferred flush on your table.  I don't really know how
> to
> > > > > increase performance of the wal aside from that, if this does seem
> to
> > > > have
> > > > > an affect.
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Oct 12, 2012 at 3:15 PM, Jonathan Bishop <
> > > jbishop.rwc@gmail.com
> > > > >wrote:
> > > > >
> > > > >> Kevin,
> > > > >>
> > > > >> Sorry, I am fairly new to HBase. Can you be specific about what
> > > > settings I
> > > > >> can change, and also where they are specified?
> > > > >>
> > > > >> Pretty sure I am not hotspotting, and increasing memstore does
not
> > > seem
> > > > to
> > > > >> have any effect.
> > > > >>
> > > > >> I do not seen any messages in my regionserver logs concerning
> > > blocking.
> > > > >>
> > > > >> I am suspecting that I am hitting some limit in our grid, but
> would
> > > > like to
> > > > >> know where that limit is being imposed.
> > > > >>
> > > > >> Jon
> > > > >>
> > > > >> On Fri, Oct 12, 2012 at 6:44 AM, Kevin O'dell <
> > > kevin.odell@cloudera.com
> > > > >> >wrote:
> > > > >>
> > > > >> > Jonathan,
> > > > >> >
> > > > >> >   Lets take a deeper look here.
> > > > >> >
> > > > >> > What is your memstore set at for the table/CF in question?
 Lets
> > > > compare
> > > > >> > that value with the flush size you are seeing for your regions.
> >  If
> > > > they
> > > > >> > are really small flushes is it all to the same region? 
If so
> that
> > > is
> > > > >> going
> > > > >> > to be schema issues.  If they are full flushes you can up
your
> > > > memstore
> > > > >> > assuming you have the heap to cover it.  If they are smaller
> > flushes
> > > > but
> > > > >> to
> > > > >> > different regions you most likely are suffering from global
> limit
> > > > >> pressure
> > > > >> > and flushing too soon.
> > > > >> >
> > > > >> > Are you flushing prematurely due to HLogs rolling?  Take
a look
> > for
> > > > too
> > > > >> > many hlogs and look at the flushes.  It may benefit you
to raise
> > > that
> > > > >> > value.
> > > > >> >
> > > > >> > Are you blocking?  As Suraj was saying you may be blocking
in
> > > 90second
> > > > >> > blocks.  Check the RS logs for those messages as well and
then
> > > Suraj's
> > > > >> > advice.
> > > > >> >
> > > > >> > This is where I would start to optimize your write path.
 I hope
> > the
> > > > >> above
> > > > >> > helps.
> > > > >> >
> > > > >> > On Fri, Oct 12, 2012 at 3:34 AM, Suraj Varma <
> svarma.ng@gmail.com
> > >
> > > > >> wrote:
> > > > >> >
> > > > >> > > What have you configured your hbase.hstore.blockingStoreFiles
> > and
> > > > >> > > hbase.hregion.memstore.block.multiplier? Both of these
block
> > > updates
> > > > >> > > when the limit is hit. Try increasing these to say
20 and 4
> from
> > > the
> > > > >> > > default 7 and 2 and see if it helps.
> > > > >> > >
> > > > >> > > If this still doesn't help, see if you can set up ganglia
to
> > get a
> > > > >> > > better insight into what is bottlenecking.
> > > > >> > > --Suraj
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > > On Thu, Oct 11, 2012 at 11:47 PM, Pankaj Misra
> > > > >> > > <pankaj.misra@impetus.co.in> wrote:
> > > > >> > > > OK, Looks like I missed out reading that part
in your
> original
> > > > mail.
> > > > >> > Did
> > > > >> > > you try some of the compaction tweaks and configurations
as
> > > > explained
> > > > >> in
> > > > >> > > the following link for your data?
> > > > >> > > > http://hbase.apache.org/book/regions.arch.html#compaction
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > Also, how much data are your putting into the
regions, and
> how
> > > > big is
> > > > >> > > one region at the end of data ingestion?
> > > > >> > > >
> > > > >> > > > Thanks and Regards
> > > > >> > > > Pankaj Misra
> > > > >> > > >
> > > > >> > > > -----Original Message-----
> > > > >> > > > From: Jonathan Bishop [mailto:jbishop.rwc@gmail.com]
> > > > >> > > > Sent: Friday, October 12, 2012 12:04 PM
> > > > >> > > > To: user@hbase.apache.org
> > > > >> > > > Subject: RE: more regionservers does not improve
performance
> > > > >> > > >
> > > > >> > > > Pankaj,
> > > > >> > > >
> > > > >> > > > Thanks  for the reply.
> > > > >> > > >
> > > > >> > > > Actually, I am using MD5 hashing to evenly spread
the keys
> > among
> > > > the
> > > > >> > > splits, so I don’t believe there is any hotspot.
In fact,
> when I
> > > > >> monitory
> > > > >> > > the web UI for HBase I see a very even load on all
the
> > > > regionservers.
> > > > >> > > >
> > > > >> > > > Jon
> > > > >> > > >
> > > > >> > > > Sent from my Windows 8 PC <
> > > > >> > http://windows.microsoft.com/consumer-preview
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >  *From:* Pankaj Misra <pankaj.misra@impetus.co.in>
> > > > >> > > > *Sent:* Thursday, October 11, 2012 8:24:32 PM
> > > > >> > > > *To:* user@hbase.apache.org
> > > > >> > > > *Subject:* RE: more regionservers does not improve
> performance
> > > > >> > > >
> > > > >> > > > Hi Jonathan,
> > > > >> > > >
> > > > >> > > > What seems to me is that, while doing the split
across all
> 40
> > > > >> mappers,
> > > > >> > > the keys are not randomized enough to leverage multiple
> regions
> > > and
> > > > the
> > > > >> > > pre-split strategy. This may be happening because all
the 40
> > > mappers
> > > > >> may
> > > > >> > be
> > > > >> > > trying to write onto a single region for sometime,
making it a
> > HOT
> > > > >> > region,
> > > > >> > >  till the key falls into another region, and then the
other
> > region
> > > > >> > becomes
> > > > >> > > a HOT region hence you may seeing a high impact of
compaction
> > > cycles
> > > > >> > > reducing your throughput.
> > > > >> > > >
> > > > >> > > > Are the keys incremental? Are the keys randomized
enough
> > across
> > > > the
> > > > >> > > splits?
> > > > >> > > >
> > > > >> > > > Ideally when all 40 mappers are running you should
see all
> the
> > > > >> regions
> > > > >> > > being filled up in parallel for maximum throughput.
Hope it
> > helps.
> > > > >> > > >
> > > > >> > > > Thanks and Regards
> > > > >> > > > Pankaj Misra
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > ________________________________________
> > > > >> > > > From: Jonathan Bishop [jbishop.rwc@gmail.com]
> > > > >> > > > Sent: Friday, October 12, 2012 5:38 AM
> > > > >> > > > To: user@hbase.apache.org
> > > > >> > > > Subject: more regionservers does not improve performance
> > > > >> > > >
> > > > >> > > > Hi,
> > > > >> > > >
> > > > >> > > > I am running a MR job with 40 simultaneous mappers,
each of
> > > which
> > > > >> does
> > > > >> > > puts to HBase. I have ganged up the puts into groups
of 1000
> > (this
> > > > >> seems
> > > > >> > to
> > > > >> > > help quite a bit) and also made sure that the table
is
> pre-split
> > > > into
> > > > >> 100
> > > > >> > > regions, and that the row keys are randomized using
MD5
> hashing.
> > > > >> > > >
> > > > >> > > > My cluster size is 10, and I am allowing 4 mappers
per
> > > > tasktracker.
> > > > >> > > >
> > > > >> > > > In my MR job I know that the mappers are able
to generate
> puts
> > > > much
> > > > >> > > faster than the puts can be handled in hbase. In other
words
> if
> > I
> > > > let
> > > > >> the
> > > > >> > > mappers run without doing hbase puts then everything
scales as
> > you
> > > > >> would
> > > > >> > > expect with the number of mappers created. It is the
hbase
> puts
> > > > which
> > > > >> > seem
> > > > >> > > to be the bottleneck.
> > > > >> > > >
> > > > >> > > > What is strange is that I do not get much run
time
> improvement
> > > by
> > > > >> > > increasing the number regionservers beyond about 4.
Indeed, it
> > > seems
> > > > >> that
> > > > >> > > the system runs slower with 8 regionservers than with
4.
> > > > >> > > >
> > > > >> > > > I have added the following in hbase-env.sh hoping
this would
> > > > help...
> > > > >> > > (from the book HBase in Action)
> > > > >> > > >
> > > > >> > > > export HBASE_OPTS="-Xmx8g"
> > > > >> > > > export HBASE_REGIONSERVER_OPTS="-Xmx8g -Xms8g
-Xmn128m
> > > > >> -XX:+UseParNewGC
> > > > >> > > -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70"
> > > > >> > > >
> > > > >> > > > # Uncomment below to enable java garbage collection
logging
> in
> > > the
> > > > >> .out
> > > > >> > > file.
> > > > >> > > > export HBASE_OPTS="${HBASE_OPTS} -verbose:gc
> > -XX:+PrintGCDetails
> > > > >> > > -XX:+PrintGCDateStamps
> -Xloggc:${HBASE_HOME}/logs/gc-hbase.log"
> > > > >> > > >
> > > > >> > > > Monitoring hbase through the web ui I see that
there are
> > pauses
> > > > for
> > > > >> > > flushing, which seems to run pretty quickly, and for
> compacting,
> > > > which
> > > > >> > > seems to take somewhat longer.
> > > > >> > > >
> > > > >> > > > Any advice for making this run faster would be
greatly
> > > > appreciated.
> > > > >> > > > Currently I am looking into installing Ganglia
to better
> > > monitory
> > > > my
> > > > >> > > cluster, but yet to have that running.
> > > > >> > > >
> > > > >> > > > I suspect an I/O issue as the regionservers do
not seem
> > terribly
> > > > >> > loaded.
> > > > >> > > >
> > > > >> > > > Thanks,
> > > > >> > > >
> > > > >> > > > Jon
> > > > >> > > >
> > > > >> > > > ________________________________
> > > > >> > > >
> > > > >> > > > Impetus Ranked in the Top 50 India’s Best Companies
to Work
> > For
> > > > 2012.
> > > > >> > > >
> > > > >> > > > Impetus webcast ‘Designing a Test Automation
Framework for
> > > > >> Multi-vendor
> > > > >> > > Interoperable Systems’ available at http://lf1.me/0E/.
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > NOTE: This message may contain information that
is
> > confidential,
> > > > >> > > proprietary, privileged or otherwise protected by law.
The
> > message
> > > > is
> > > > >> > > intended solely for the named addressee. If received
in error,
> > > > please
> > > > >> > > destroy and notify the sender. Any use of this email
is
> > prohibited
> > > > when
> > > > >> > > received in error. Impetus does not represent, warrant
and/or
> > > > >> guarantee,
> > > > >> > > that the integrity of this communication has been maintained
> nor
> > > > that
> > > > >> the
> > > > >> > > communication is free of errors, virus, interception
or
> > > > interference.
> > > > >> > > >
> > > > >> > > > ________________________________
> > > > >> > > >
> > > > >> > > > Impetus Ranked in the Top 50 India’s Best Companies
to Work
> > For
> > > > 2012.
> > > > >> > > >
> > > > >> > > > Impetus webcast ‘Designing a Test Automation
Framework for
> > > > >> Multi-vendor
> > > > >> > > Interoperable Systems’ available at http://lf1.me/0E/.
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > NOTE: This message may contain information that
is
> > confidential,
> > > > >> > > proprietary, privileged or otherwise protected by law.
The
> > message
> > > > is
> > > > >> > > intended solely for the named addressee. If received
in error,
> > > > please
> > > > >> > > destroy and notify the sender. Any use of this email
is
> > prohibited
> > > > when
> > > > >> > > received in error. Impetus does not represent, warrant
and/or
> > > > >> guarantee,
> > > > >> > > that the integrity of this communication has been maintained
> nor
> > > > that
> > > > >> the
> > > > >> > > communication is free of errors, virus, interception
or
> > > > interference.
> > > > >> > >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> > Kevin O'Dell
> > > > >> > Customer Operations Engineer, Cloudera
> > > > >> >
> > > > >>
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message