hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Leach <jle...@splicemachine.com>
Subject Re: Hot Region Server With No Hot Region
Date Thu, 01 Dec 2016 19:56:17 GMT

Did you validate that Meta is not on the “Hot” region server?  

John Leach

> On Dec 1, 2016, at 1:50 PM, Saad Mufti <saad.mufti@gmail.com> wrote:
> Hi,
> We are using HBase 1.0 on CDH 5.5.2 . We have taken great care to avoid
> hotspotting due to inadvertent data patterns by prepending an MD5 based 4
> digit hash prefix to all our data keys. This works fine most of the times,
> but more and more (as much as once or twice a day) recently we have
> occasions where one region server suddenly becomes "hot" (CPU above or
> around 95% in various monitoring tools). When it happens it lasts for
> hours, occasionally the hotspot might jump to another region server as the
> master decide the region is unresponsive and gives its region to another
> server.
> For the longest time, we thought this must be some single rogue key in our
> input data that is being hammered. All attempts to track this down have
> failed though, and the following behavior argues against this being
> application based:
> 1. plotted Get and Put rate by region on the "hot" region server in
> Cloudera Manager Charts, shows no single region is an outlier.
> 2. cleanly restarting just the region server process causes its regions to
> randomly migrate to other region servers, then it gets new ones from the
> HBase master, basically a sort of shuffling, then the hotspot goes away. If
> it were application based, you'd expect the hotspot to just jump to another
> region server.
> 3. have pored through region server logs and can't see anything out of the
> ordinary happening
> The only other pertinent thing to mention might be that we have a special
> process of our own running outside the cluster that does cluster wide major
> compaction in a rolling fashion, where each batch consists of one region
> from each region server, and it waits before one batch is completely done
> before starting another. We have seen no real impact on the hotspot from
> shutting this down and in normal times it doesn't impact our read or write
> performance much.
> We are at our wit's end, anyone have experience with a scenario like this?
> Any help/guidance would be most appreciated.
> -----
> Saad

View raw message