hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Busbey <bus...@apache.org>
Subject Re: Region Server Hotspot/CPU Problem
Date Wed, 01 Mar 2017 15:25:45 GMT
Several of those jiras are fixed in later versions of CDH. Since the
inclusion of jiras in packaging by particular vendors is a vendor
specific issue, please seek out help from the vendor (e.g. on the
community forum you've just mentioned).

On Wed, Mar 1, 2017 at 8:49 AM, Saad Mufti <saad.mufti@gmail.com> wrote:
> Someone in our team found this:
>
> http://community.cloudera.com/t5/Storage-Random-Access-HDFS/CPU-Usage-high-when-using-G1GC/td-p/48101
>
> Looks like we're bitten by this bug. Unfortunately this is only fixed in
> HBase 1.4.0 so we'll have to undertake a version upgrade which is not
> trivial.
>
> -----
> Saad
>
>
> On Wed, Mar 1, 2017 at 9:38 AM, Sudhir Babu Pothineni <sbpothineni@gmail.com
>> wrote:
>
>> First obvious thing to check is "major compaction" happening at the same
>> time when it goes to 100% CPU?
>> See this helps:
>> https://community.hortonworks.com/articles/52616/hbase-
>> compaction-tuning-tips.html
>>
>>
>>
>> Sent from my iPhone
>>
>> > On Mar 1, 2017, at 6:06 AM, Saad Mufti <saad.mufti@teamaol.com> wrote:
>> >
>> > Hi,
>> >
>> > We are using HBase 1.0.0-cdh5.5.2 on AWS EC2 instances. The load on HBase
>> > is heavy and a mix of reads and writes. For a few months we have had a
>> > problem where occasionally (once a day or more) one of the region servers
>> > starts consuming close to 100% CPU. This causes all the client thread
>> pool
>> > to get filled up serving the slow region server, causing overall response
>> > times to slow to a crawl and many calls either start timing out right in
>> > the client, or at a higher level.
>> >
>> > We have done lots of analysis and looked at various metrics but could
>> never
>> > pin it down to any particular kind of traffic or specific "hot keys".
>> > Looking at region server logs has not resulted in any findings. The only
>> > sort of vague evidence we have is that from the reported metrics, reads
>> per
>> > second on the hot server looks more than the other but not in a steady
>> > state but in a spiky but steady fashion, but gets per second looks no
>> > different than any other server.
>> >
>> > Until now our hacky way that we discovered to get around this was to just
>> > restart the region server. This works because while some calls error out
>> > while the regions are in transition, this is a batch oriented system
>> with a
>> > retry strategy built in.
>> >
>> > But just yesterday we discovered something interesting, if we connect to
>> > the region server in VisualVM and press the "Perform GC" button, there
>> > seems to be a brief pause and then CPU settles down back to normal. This
>> is
>> > despite the fact that memory appears to be under no pressure and before
>> we
>> > do this, VisualVM indicates very low percentage of CPU time spent in GC,
>> so
>> > we're baffled, and hoping someone with deeper insight into the HBase code
>> > could explain this behavior.
>> >
>> > Our region server processes are configured with 32GB of RAM and the
>> > following GC related JVM settings :
>> >
>> > HBASE_REGIONSERVER_OPTS=-Xms34359738368 -Xmx34359738368 -XX:+UseG1GC
>> > -XX:MaxGCPauseMillis=100
>> > -XX:+ParallelRefProcEnabled -XX:-ResizePLAB -XX:ParallelGCThreads=14
>> > -XX:InitiatingHeapOccupancyPercent=70
>> >
>> > Any insight anyone can provide would be most appreciated.
>> >
>> > ----
>> > Saad
>>

Mime
View raw message