Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
From: John Leach <jleach@splicemachine.com>
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_9F0E84E3-0E71-47AB-8E57-49579091DAB0"
Mime-Version: 1.0 (Mac OS X Mail 10.1 \(3251\))
Subject: Re: Hot Region Server With No Hot Region
Date: Fri, 2 Dec 2016 13:34:45 -0600
References: <CAFh5nhzXNy-jYe-_49gwTR0LUS+GFHy037kLuwuLVUna1d9Ffg@mail.gmail.com>
 <B3514832-5420-455A-B6C4-3BB6876F9544@splicemachine.com>
 <CAFh5nhyHZ+i_LWM34-55HafSXBTgqwNJy8FtbhJRRBdtv_sidg@mail.gmail.com>
 <C7916965-2524-42E4-96B1-DA20A738998B@splicemachine.com>
 <CAFh5nhxMDZER=ftEByEHxGEH=qE_1kF_HNDByjZD8Cdp4hwwAQ@mail.gmail.com>
 <F10EF7B0-1934-4AFD-963D-A2003E95454D@yahoo.com>
 <CAFh5nhzYczXv5Q-3FKYcKUD0iFg_v7kGav0vdJu5qo9mEH8NTA@mail.gmail.com>
 <CAFh5nhzVT0yY=6_k69h3sdxeDRcTzACPcGxGkeU8j=HLcQKwyA@mail.gmail.com>
To: user@hbase.apache.org
In-Reply-To: <CAFh5nhzVT0yY=6_k69h3sdxeDRcTzACPcGxGkeU8j=HLcQKwyA@mail.gmail.com>
Message-Id: <3EF6502A-3BED-4B91-844C-C7C9EC65A0DB@splicemachine.com>
archived-at: Fri, 02 Dec 2016 19:34:56 -0000

--Apple-Mail=_9F0E84E3-0E71-47AB-8E57-49579091DAB0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

Here is what I see...


* Short Compaction Running on Heap
=
"regionserver/ip-10-99-181-146.aolp-prd.us-east-1.ec2.aolcloud.net/10.99.1=
81.146:60020-shortCompactions-1480229281547" - Thread t@242
   java.lang.Thread.State: RUNNABLE
    at =
org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.compressSingleKey=
Value(FastDiffDeltaEncoder.java:270)
    at =
org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.internalEncode(Fa=
stDiffDeltaEncoder.java:245)
    at =
org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder.encode(Buffer=
edDataBlockEncoder.java:987)
    at =
org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.encode(FastDiffDe=
ltaEncoder.java:58)
    at =
org.apache.hadoop.hbase.io.hfile.HFileDataBlockEncoderImpl.encode(HFileDat=
aBlockEncoderImpl.java:97)
    at =
org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.write(HFileBlock.java:8=
66)
    at =
org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:2=
70)
    at =
org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:8=
7)
    at =
org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.jav=
a:949)
    at =
org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompacti=
on(Compactor.java:282)
    at =
org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(=
DefaultCompactor.java:105)
    at =
org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionC=
ontext.compact(DefaultStoreEngine.java:124)
    at =
org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1233)
    at =
org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1770)
    at =
org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.r=
un(CompactSplitThread.java:520)
    at =
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:=
1142)
    at =
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java=
:617)
    at java.lang.Thread.run(Thread.java:745)


* WAL Syncs waiting=E2=80=A6   ALL 5
"sync.0" - Thread t@202
   java.lang.Thread.State: TIMED_WAITING
    at java.lang.Object.wait(Native Method)
    - waiting on <67ba892d> (a java.util.LinkedList)
    at =
org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(DFSOutputStream.j=
ava:2337)
    at =
org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:22=
24)
    at =
org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:2116)
    at =
org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130=
)
    at =
org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLo=
gWriter.java:173)
    at =
org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java=
:1379)
    at java.lang.Thread.run(Thread.java:745)

* Mutations backing up very badly...

"B.defaultRpcServer.handler=3D103,queue=3D7,port=3D60020" - Thread t@155
   java.lang.Thread.State: TIMED_WAITING
    at java.lang.Object.wait(Native Method)
    - waiting on <6ab54ea3> (a =
org.apache.hadoop.hbase.regionserver.wal.SyncFuture)
    at =
org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:16=
7)
    at =
org.apache.hadoop.hbase.regionserver.wal.FSHLog.blockOnSync(FSHLog.java:15=
04)
    at =
org.apache.hadoop.hbase.regionserver.wal.FSHLog.publishSyncThenBlockOnComp=
letion(FSHLog.java:1498)
    at =
org.apache.hadoop.hbase.regionserver.wal.FSHLog.sync(FSHLog.java:1632)
    at =
org.apache.hadoop.hbase.regionserver.HRegion.syncOrDefer(HRegion.java:7737=
)
    at =
org.apache.hadoop.hbase.regionserver.HRegion.processRowsWithLocks(HRegion.=
java:6504)
    at =
org.apache.hadoop.hbase.regionserver.HRegion.mutateRowsWithLocks(HRegion.j=
ava:6352)
    at =
org.apache.hadoop.hbase.regionserver.HRegion.mutateRowsWithLocks(HRegion.j=
ava:6334)
    at =
org.apache.hadoop.hbase.regionserver.HRegion.mutateRow(HRegion.java:6325)
    at =
org.apache.hadoop.hbase.regionserver.RSRpcServices.mutateRows(RSRpcService=
s.java:418)
    at =
org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.jav=
a:1916)
    at =
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.ca=
llBlockingMethod(ClientProtos.java:32213)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2034)
    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
    at =
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)=

    at =
org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
    at java.lang.Thread.run(Thread.java:745)


Too many writers being blocked attempting to write to WAL.

What does your disk infrastructure look like?  Can you get away with =
Multi-wal?  Ugh...

Regards,
John Leach


> On Dec 2, 2016, at 1:20 PM, Saad Mufti <saad.mufti@gmail.com> wrote:
>=20
> Hi Ted,
>=20
> Finally we have another hotspot going on, same symptoms as before, =
here is
> the pastebin for the stack trace from the region server that I =
obtained via
> VisualVM:
>=20
> http://pastebin.com/qbXPPrXk
>=20
> Would really appreciate any insight you or anyone else can provide.
>=20
> Thanks.
>=20
> ----
> Saad
>=20
>=20
> On Thu, Dec 1, 2016 at 6:08 PM, Saad Mufti <saad.mufti@gmail.com> =
wrote:
>=20
>> Sure will, the next time it happens.
>>=20
>> Thanks!!!
>>=20
>> ----
>> Saad
>>=20
>>=20
>> On Thu, Dec 1, 2016 at 5:01 PM, Ted Yu <ted_yu@yahoo.com.invalid> =
wrote:
>>=20
>>> =46rom #2 in the initial email, the hbase:meta might not be the =
cause for
>>> the hotspot.
>>>=20
>>> Saad:
>>> Can you pastebin stack trace of the hot region server when this =
happens
>>> again ?
>>>=20
>>> Thanks
>>>=20
>>>> On Dec 2, 2016, at 4:48 AM, Saad Mufti <saad.mufti@gmail.com> =
wrote:
>>>>=20
>>>> We used a pre-split into 1024 regions at the start but we =
miscalculated
>>> our
>>>> data size, so there were still auto-splits storms at the beginning =
as
>>> data
>>>> size stabilized, it has ended up at around 9500 or so regions, plus =
a
>>> few
>>>> thousand regions for a few other tables (much smaller). But haven't =
had
>>> any
>>>> new auto-splits in a couple of months. And the hotspots only =
started
>>>> happening recently.
>>>>=20
>>>> Our hashing scheme is very simple, we take the MD5 of the key, then
>>> form a
>>>> 4 digit prefix based on the first two bytes of the MD5 normalized =
to be
>>>> within the range 0-1023 . I am fairly confident about this scheme
>>>> especially since even during the hotspot we see no evidence so far =
that
>>> any
>>>> particular region is taking disproportionate traffic (based on =
Cloudera
>>>> Manager per region charts on the hotspot server). Does that look =
like a
>>>> reasonable scheme to randomize which region any give key goes to? =
And
>>> the
>>>> start of the hotspot doesn't seem to correspond to any region =
splitting
>>> or
>>>> moving from one server to another activity.
>>>>=20
>>>> Thanks.
>>>>=20
>>>> ----
>>>> Saad
>>>>=20
>>>>=20
>>>>> On Thu, Dec 1, 2016 at 3:32 PM, John Leach =
<jleach@splicemachine.com>
>>> wrote:
>>>>>=20
>>>>> Saad,
>>>>>=20
>>>>> Region move or split causes client connections to simultaneously
>>> refresh
>>>>> their meta.
>>>>>=20
>>>>> Key word is supposed.  We have seen meta hot spotting from time to =
time
>>>>> and on different versions at Splice Machine.
>>>>>=20
>>>>> How confident are you in your hashing algorithm?
>>>>>=20
>>>>> Regards,
>>>>> John Leach
>>>>>=20
>>>>>=20
>>>>>=20
>>>>>> On Dec 1, 2016, at 2:25 PM, Saad Mufti <saad.mufti@gmail.com> =
wrote:
>>>>>>=20
>>>>>> No never thought about that. I just figured out how to locate the
>>> server
>>>>>> for that table after you mentioned it. We'll have to keep an eye =
on it
>>>>> next
>>>>>> time we have a hotspot to see if it coincides with the hotspot =
server.
>>>>>>=20
>>>>>> What would be the theory for how it could become a hotspot? Isn't =
the
>>>>>> client supposed to cache it and only go back for a refresh if it =
hits
>>> a
>>>>>> region that is not in its expected location?
>>>>>>=20
>>>>>> ----
>>>>>> Saad
>>>>>>=20
>>>>>>=20
>>>>>> On Thu, Dec 1, 2016 at 2:56 PM, John Leach =
<jleach@splicemachine.com>
>>>>> wrote:
>>>>>>=20
>>>>>>> Saad,
>>>>>>>=20
>>>>>>> Did you validate that Meta is not on the =E2=80=9CHot=E2=80=9D =
region server?
>>>>>>>=20
>>>>>>> Regards,
>>>>>>> John Leach
>>>>>>>=20
>>>>>>>=20
>>>>>>>=20
>>>>>>>> On Dec 1, 2016, at 1:50 PM, Saad Mufti <saad.mufti@gmail.com>
>>> wrote:
>>>>>>>>=20
>>>>>>>> Hi,
>>>>>>>>=20
>>>>>>>> We are using HBase 1.0 on CDH 5.5.2 . We have taken great care =
to
>>> avoid
>>>>>>>> hotspotting due to inadvertent data patterns by prepending an =
MD5
>>>>> based 4
>>>>>>>> digit hash prefix to all our data keys. This works fine most of =
the
>>>>>>> times,
>>>>>>>> but more and more (as much as once or twice a day) recently we =
have
>>>>>>>> occasions where one region server suddenly becomes "hot" (CPU =
above
>>> or
>>>>>>>> around 95% in various monitoring tools). When it happens it =
lasts
>>> for
>>>>>>>> hours, occasionally the hotspot might jump to another region =
server
>>> as
>>>>>>> the
>>>>>>>> master decide the region is unresponsive and gives its region =
to
>>>>> another
>>>>>>>> server.
>>>>>>>>=20
>>>>>>>> For the longest time, we thought this must be some single rogue =
key
>>> in
>>>>>>> our
>>>>>>>> input data that is being hammered. All attempts to track this =
down
>>> have
>>>>>>>> failed though, and the following behavior argues against this =
being
>>>>>>>> application based:
>>>>>>>>=20
>>>>>>>> 1. plotted Get and Put rate by region on the "hot" region =
server in
>>>>>>>> Cloudera Manager Charts, shows no single region is an outlier.
>>>>>>>>=20
>>>>>>>> 2. cleanly restarting just the region server process causes its
>>> regions
>>>>>>> to
>>>>>>>> randomly migrate to other region servers, then it gets new ones =
from
>>>>> the
>>>>>>>> HBase master, basically a sort of shuffling, then the hotspot =
goes
>>>>> away.
>>>>>>> If
>>>>>>>> it were application based, you'd expect the hotspot to just =
jump to
>>>>>>> another
>>>>>>>> region server.
>>>>>>>>=20
>>>>>>>> 3. have pored through region server logs and can't see anything =
out
>>> of
>>>>>>> the
>>>>>>>> ordinary happening
>>>>>>>>=20
>>>>>>>> The only other pertinent thing to mention might be that we have =
a
>>>>> special
>>>>>>>> process of our own running outside the cluster that does =
cluster
>>> wide
>>>>>>> major
>>>>>>>> compaction in a rolling fashion, where each batch consists of =
one
>>>>> region
>>>>>>>> from each region server, and it waits before one batch is =
completely
>>>>> done
>>>>>>>> before starting another. We have seen no real impact on the =
hotspot
>>>>> from
>>>>>>>> shutting this down and in normal times it doesn't impact our =
read or
>>>>>>> write
>>>>>>>> performance much.
>>>>>>>>=20
>>>>>>>> We are at our wit's end, anyone have experience with a scenario =
like
>>>>>>> this?
>>>>>>>> Any help/guidance would be most appreciated.
>>>>>>>>=20
>>>>>>>> -----
>>>>>>>> Saad
>>>>>>>=20
>>>>>>>=20
>>>>>=20
>>>>>=20
>>>=20
>>=20
>>=20


--Apple-Mail=_9F0E84E3-0E71-47AB-8E57-49579091DAB0--