Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
MIME-Version: 1.0
Sender: saint.ack@gmail.com
In-Reply-To: <CAFh5nhzCEOCPLGxWVMSetMEOOZ6LiBXJcD6AEN8_QWkf4PL6gQ@mail.gmail.com>
References: <CAJymifPkc-TUrfoMg2o=8YD=gbPgVBRrnmTV8qxyTUDRJKysug@mail.gmail.com>
 <8F6D4CD2-7B2E-4363-B700-C9B960BEEFA3@gmail.com> <CAFh5nhzCEOCPLGxWVMSetMEOOZ6LiBXJcD6AEN8_QWkf4PL6gQ@mail.gmail.com>
From: Stack <stack@duboce.net>
Date: Fri, 3 Mar 2017 06:01:45 +0000
Message-ID: <CADcMMgEVHeG0Dwx1=emcEeZm7HcQrOFC48FehuXb0+xA49gimg@mail.gmail.com>
Subject: Re: Region Server Hotspot/CPU Problem
To: Hbase-User <user@hbase.apache.org>
Content-Type: multipart/alternative; boundary=001a114694badbe8860549cd4722
archived-at: Fri, 03 Mar 2017 06:01:51 -0000

--001a114694badbe8860549cd4722
Content-Type: text/plain; charset=UTF-8

Could try CMS GC; the thread local collection is less prominent when CMS is
in place it seems (Try thread dumping and comparing to thread dumps posted
to HBASE-17072 and related issues; the original poster did a nice job
describing the problem).

St.Ack

On Wed, Mar 1, 2017 at 2:49 PM, Saad Mufti <saad.mufti@gmail.com> wrote:

> Someone in our team found this:
>
> http://community.cloudera.com/t5/Storage-Random-Access-HDFS/
> CPU-Usage-high-when-using-G1GC/td-p/48101
>
> Looks like we're bitten by this bug. Unfortunately this is only fixed in
> HBase 1.4.0 so we'll have to undertake a version upgrade which is not
> trivial.
>
> -----
> Saad
>
>
> On Wed, Mar 1, 2017 at 9:38 AM, Sudhir Babu Pothineni <
> sbpothineni@gmail.com
> > wrote:
>
> > First obvious thing to check is "major compaction" happening at the same
> > time when it goes to 100% CPU?
> > See this helps:
> > https://community.hortonworks.com/articles/52616/hbase-
> > compaction-tuning-tips.html
> >
> >
> >
> > Sent from my iPhone
> >
> > > On Mar 1, 2017, at 6:06 AM, Saad Mufti <saad.mufti@teamaol.com> wrote:
> > >
> > > Hi,
> > >
> > > We are using HBase 1.0.0-cdh5.5.2 on AWS EC2 instances. The load on
> HBase
> > > is heavy and a mix of reads and writes. For a few months we have had a
> > > problem where occasionally (once a day or more) one of the region
> servers
> > > starts consuming close to 100% CPU. This causes all the client thread
> > pool
> > > to get filled up serving the slow region server, causing overall
> response
> > > times to slow to a crawl and many calls either start timing out right
> in
> > > the client, or at a higher level.
> > >
> > > We have done lots of analysis and looked at various metrics but could
> > never
> > > pin it down to any particular kind of traffic or specific "hot keys".
> > > Looking at region server logs has not resulted in any findings. The
> only
> > > sort of vague evidence we have is that from the reported metrics, reads
> > per
> > > second on the hot server looks more than the other but not in a steady
> > > state but in a spiky but steady fashion, but gets per second looks no
> > > different than any other server.
> > >
> > > Until now our hacky way that we discovered to get around this was to
> just
> > > restart the region server. This works because while some calls error
> out
> > > while the regions are in transition, this is a batch oriented system
> > with a
> > > retry strategy built in.
> > >
> > > But just yesterday we discovered something interesting, if we connect
> to
> > > the region server in VisualVM and press the "Perform GC" button, there
> > > seems to be a brief pause and then CPU settles down back to normal.
> This
> > is
> > > despite the fact that memory appears to be under no pressure and before
> > we
> > > do this, VisualVM indicates very low percentage of CPU time spent in
> GC,
> > so
> > > we're baffled, and hoping someone with deeper insight into the HBase
> code
> > > could explain this behavior.
> > >
> > > Our region server processes are configured with 32GB of RAM and the
> > > following GC related JVM settings :
> > >
> > > HBASE_REGIONSERVER_OPTS=-Xms34359738368 -Xmx34359738368 -XX:+UseG1GC
> > > -XX:MaxGCPauseMillis=100
> > > -XX:+ParallelRefProcEnabled -XX:-ResizePLAB -XX:ParallelGCThreads=14
> > > -XX:InitiatingHeapOccupancyPercent=70
> > >
> > > Any insight anyone can provide would be most appreciated.
> > >
> > > ----
> > > Saad
> >
>

--001a114694badbe8860549cd4722--