hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Leach <jle...@splicemachine.com>
Subject Re: Hot Region Server With No Hot Region
Date Thu, 01 Dec 2016 20:32:09 GMT

Region move or split causes client connections to simultaneously refresh their meta.

Key word is supposed.  We have seen meta hot spotting from time to time and on different versions
at Splice Machine.  

How confident are you in your hashing algorithm?

John Leach

> On Dec 1, 2016, at 2:25 PM, Saad Mufti <saad.mufti@gmail.com> wrote:
> No never thought about that. I just figured out how to locate the server
> for that table after you mentioned it. We'll have to keep an eye on it next
> time we have a hotspot to see if it coincides with the hotspot server.
> What would be the theory for how it could become a hotspot? Isn't the
> client supposed to cache it and only go back for a refresh if it hits a
> region that is not in its expected location?
> ----
> Saad
> On Thu, Dec 1, 2016 at 2:56 PM, John Leach <jleach@splicemachine.com> wrote:
>> Saad,
>> Did you validate that Meta is not on the “Hot” region server?
>> Regards,
>> John Leach
>>> On Dec 1, 2016, at 1:50 PM, Saad Mufti <saad.mufti@gmail.com> wrote:
>>> Hi,
>>> We are using HBase 1.0 on CDH 5.5.2 . We have taken great care to avoid
>>> hotspotting due to inadvertent data patterns by prepending an MD5 based 4
>>> digit hash prefix to all our data keys. This works fine most of the
>> times,
>>> but more and more (as much as once or twice a day) recently we have
>>> occasions where one region server suddenly becomes "hot" (CPU above or
>>> around 95% in various monitoring tools). When it happens it lasts for
>>> hours, occasionally the hotspot might jump to another region server as
>> the
>>> master decide the region is unresponsive and gives its region to another
>>> server.
>>> For the longest time, we thought this must be some single rogue key in
>> our
>>> input data that is being hammered. All attempts to track this down have
>>> failed though, and the following behavior argues against this being
>>> application based:
>>> 1. plotted Get and Put rate by region on the "hot" region server in
>>> Cloudera Manager Charts, shows no single region is an outlier.
>>> 2. cleanly restarting just the region server process causes its regions
>> to
>>> randomly migrate to other region servers, then it gets new ones from the
>>> HBase master, basically a sort of shuffling, then the hotspot goes away.
>> If
>>> it were application based, you'd expect the hotspot to just jump to
>> another
>>> region server.
>>> 3. have pored through region server logs and can't see anything out of
>> the
>>> ordinary happening
>>> The only other pertinent thing to mention might be that we have a special
>>> process of our own running outside the cluster that does cluster wide
>> major
>>> compaction in a rolling fashion, where each batch consists of one region
>>> from each region server, and it waits before one batch is completely done
>>> before starting another. We have seen no real impact on the hotspot from
>>> shutting this down and in normal times it doesn't impact our read or
>> write
>>> performance much.
>>> We are at our wit's end, anyone have experience with a scenario like
>> this?
>>> Any help/guidance would be most appreciated.
>>> -----
>>> Saad

View raw message