Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
MIME-Version: 1.0
References: <CAFh5nhzXNy-jYe-_49gwTR0LUS+GFHy037kLuwuLVUna1d9Ffg@mail.gmail.com>
 <B3514832-5420-455A-B6C4-3BB6876F9544@splicemachine.com> <CAFh5nhyHZ+i_LWM34-55HafSXBTgqwNJy8FtbhJRRBdtv_sidg@mail.gmail.com>
 <C7916965-2524-42E4-96B1-DA20A738998B@splicemachine.com> <CAFh5nhxMDZER=ftEByEHxGEH=qE_1kF_HNDByjZD8Cdp4hwwAQ@mail.gmail.com>
 <F10EF7B0-1934-4AFD-963D-A2003E95454D@yahoo.com> <CAFh5nhzYczXv5Q-3FKYcKUD0iFg_v7kGav0vdJu5qo9mEH8NTA@mail.gmail.com>
 <CAFh5nhzVT0yY=6_k69h3sdxeDRcTzACPcGxGkeU8j=HLcQKwyA@mail.gmail.com>
 <3EF6502A-3BED-4B91-844C-C7C9EC65A0DB@splicemachine.com> <CAFh5nhwPzYTob3nFO87gnFnwONgPUGup-9THBMoo3sVB5GBC2w@mail.gmail.com>
 <1979496454.4779903.1480710455902@mail.yahoo.com> <CAFh5nhxYf0GPck7xwj3rOh5csDEz+FEhOSHp8+3jnWAj-UZbjQ@mail.gmail.com>
In-Reply-To: <CAFh5nhxYf0GPck7xwj3rOh5csDEz+FEhOSHp8+3jnWAj-UZbjQ@mail.gmail.com>
From: Jeremy Carroll <phobos182@gmail.com>
Date: Sat, 03 Dec 2016 15:39:36 +0000
Message-ID: <CA+NY1YC9jY4bPwj1jNDfLAXvZZYNA2aa8-+vSPhNHuqjMCUO4Q@mail.gmail.com>
Subject: Re: Hot Region Server With No Hot Region
To: user@hbase.apache.org
Content-Type: multipart/alternative; boundary=001a113a22ce4b498f0542c2dd2a
archived-at: Sat, 03 Dec 2016 15:40:00 -0000

--001a113a22ce4b498f0542c2dd2a
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

I would check compaction, investigate throttling if it's causing high CPU.

On Sat, Dec 3, 2016 at 6:20 AM Saad Mufti <saad.mufti@gmail.com> wrote:

> No.
>
> ----
> Saad
>
>
> On Fri, Dec 2, 2016 at 3:27 PM, Ted Yu <ted_yu@yahoo.com.invalid> wrote:
>
> > Some how I couldn't access the pastebin (I am in China now).
> > Did the region server showing hotspot host meta ?
> > Thanks
> >
> >     On Friday, December 2, 2016 11:53 AM, Saad Mufti <
> saad.mufti@gmail.com>
> > wrote:
> >
> >
> >  We're in AWS with D2.4xLarge instances. Each instance has 12 independe=
nt
> > spindles/disks from what I can tell.
> >
> > We have charted get_rate and mutate_rate by host and
> >
> > a) mutate_rate shows no real outliers
> > b) read_rate shows the overall rate on the "hotspot" region server is a
> bit
> > higher than every other server, not severely but enough that it is a bi=
t
> > noticeable. But when we chart get_rate on that server by region, no one
> > region stands out.
> >
> > get_rate chart by host:
> >
> > https://snag.gy/hmoiDw.jpg
> >
> > mutate_rate chart by host:
> >
> > https://snag.gy/jitdMN.jpg
> >
> > ----
> > Saad
> >
> >
> > ----
> > Saad
> >
> >
> > On Fri, Dec 2, 2016 at 2:34 PM, John Leach <jleach@splicemachine.com>
> > wrote:
> >
> > > Here is what I see...
> > >
> > >
> > > * Short Compaction Running on Heap
> > > "regionserver/ip-10-99-181-146.aolp-prd.us-east-1.ec2.
> > > aolcloud.net/10.99.181.146:60020-shortCompactions-1480229281547" -
> > Thread
> > > t@242
> > >    java.lang.Thread.State: RUNNABLE
> > >    at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.
> > > compressSingleKeyValue(FastDiffDeltaEncoder.java:270)
> > >    at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.
> > > internalEncode(FastDiffDeltaEncoder.java:245)
> > >    at org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder.
> > > encode(BufferedDataBlockEncoder.java:987)
> > >    at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.
> > > encode(FastDiffDeltaEncoder.java:58)
> > >    at org.apache.hadoop.hbase.io
> .hfile.HFileDataBlockEncoderImpl.encode(
> > > HFileDataBlockEncoderImpl.java:97)
> > >    at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.write(
> > > HFileBlock.java:866)
> > >    at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(
> > > HFileWriterV2.java:270)
> > >    at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(
> > > HFileWriterV3.java:87)
> > >    at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.
> > > append(StoreFile.java:949)
> > >    at org.apache.hadoop.hbase.regionserver.compactions.
> > > Compactor.performCompaction(Compactor.java:282)
> > >    at org.apache.hadoop.hbase.regionserver.compactions.
> > > DefaultCompactor.compact(DefaultCompactor.java:105)
> > >    at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$
> > > DefaultCompactionContext.compact(DefaultStoreEngine.java:124)
> > >    at org.apache.hadoop.hbase.regionserver.HStore.compact(
> > > HStore.java:1233)
> > >    at org.apache.hadoop.hbase.regionserver.HRegion.compact(
> > > HRegion.java:1770)
> > >    at org.apache.hadoop.hbase.regionserver.CompactSplitThread$
> > > CompactionRunner.run(CompactSplitThread.java:520)
> > >    at java.util.concurrent.ThreadPoolExecutor.runWorker(
> > > ThreadPoolExecutor.java:1142)
> > >    at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > > ThreadPoolExecutor.java:617)
> > >    at java.lang.Thread.run(Thread.java:745)
> > >
> > >
> > > * WAL Syncs waiting=E2=80=A6  ALL 5
> > > "sync.0" - Thread t@202
> > >    java.lang.Thread.State: TIMED_WAITING
> > >    at java.lang.Object.wait(Native Method)
> > >    - waiting on <67ba892d> (a java.util.LinkedList)
> > >    at org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(
> > > DFSOutputStream.java:2337)
> > >    at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(
> > > DFSOutputStream.java:2224)
> > >    at org.apache.hadoop.hdfs.DFSOutputStream.hflush(
> > > DFSOutputStream.java:2116)
> > >    at org.apache.hadoop.fs.FSDataOutputStream.hflush(
> > > FSDataOutputStream.java:130)
> > >    at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync=
(
> > > ProtobufLogWriter.java:173)
> > >    at org.apache.hadoop.hbase.regionserver.wal.FSHLog$
> > > SyncRunner.run(FSHLog.java:1379)
> > >    at java.lang.Thread.run(Thread.java:745)
> > >
> > > * Mutations backing up very badly...
> > >
> > > "B.defaultRpcServer.handler=3D103,queue=3D7,port=3D60020" - Thread t@=
155
> > >    java.lang.Thread.State: TIMED_WAITING
> > >    at java.lang.Object.wait(Native Method)
> > >    - waiting on <6ab54ea3> (a org.apache.hadoop.hbase.
> > > regionserver.wal.SyncFuture)
> > >    at org.apache.hadoop.hbase.regionserver.wal.SyncFuture.
> > > get(SyncFuture.java:167)
> > >    at org.apache.hadoop.hbase.regionserver.wal.FSHLog.
> > > blockOnSync(FSHLog.java:1504)
> > >    at org.apache.hadoop.hbase.regionserver.wal.FSHLog.
> > > publishSyncThenBlockOnCompletion(FSHLog.java:1498)
> > >    at org.apache.hadoop.hbase.regionserver.wal.FSHLog.sync(
> > > FSHLog.java:1632)
> > >    at org.apache.hadoop.hbase.regionserver.HRegion.
> > > syncOrDefer(HRegion.java:7737)
> > >    at org.apache.hadoop.hbase.regionserver.HRegion.
> > > processRowsWithLocks(HRegion.java:6504)
> > >    at org.apache.hadoop.hbase.regionserver.HRegion.
> > > mutateRowsWithLocks(HRegion.java:6352)
> > >    at org.apache.hadoop.hbase.regionserver.HRegion.
> > > mutateRowsWithLocks(HRegion.java:6334)
> > >    at org.apache.hadoop.hbase.regionserver.HRegion.
> > > mutateRow(HRegion.java:6325)
> > >    at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> > > mutateRows(RSRpcServices.java:418)
> > >    at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> > > multi(RSRpcServices.java:1916)
> > >    at org.apache.hadoop.hbase.protobuf.generated.
> > >
> ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213)
> > >    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2034)
> > >    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > >    at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
> > > RpcExecutor.java:130)
> > >    at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.
> > java:107)
> > >    at java.lang.Thread.run(Thread.java:745)
> > >
> > >
> > > Too many writers being blocked attempting to write to WAL.
> > >
> > > What does your disk infrastructure look like?  Can you get away with
> > > Multi-wal?  Ugh...
> > >
> > > Regards,
> > > John Leach
> > >
> > >
> > > > On Dec 2, 2016, at 1:20 PM, Saad Mufti <saad.mufti@gmail.com> wrote=
:
> > > >
> > > > Hi Ted,
> > > >
> > > > Finally we have another hotspot going on, same symptoms as before,
> here
> > > is
> > > > the pastebin for the stack trace from the region server that I
> obtained
> > > via
> > > > VisualVM:
> > > >
> > > > http://pastebin.com/qbXPPrXk
> > > >
> > > > Would really appreciate any insight you or anyone else can provide.
> > > >
> > > > Thanks.
> > > >
> > > > ----
> > > > Saad
> > > >
> > > >
> > > > On Thu, Dec 1, 2016 at 6:08 PM, Saad Mufti <saad.mufti@gmail.com>
> > wrote:
> > > >
> > > >> Sure will, the next time it happens.
> > > >>
> > > >> Thanks!!!
> > > >>
> > > >> ----
> > > >> Saad
> > > >>
> > > >>
> > > >> On Thu, Dec 1, 2016 at 5:01 PM, Ted Yu <ted_yu@yahoo.com.invalid>
> > > wrote:
> > > >>
> > > >>> From #2 in the initial email, the hbase:meta might not be the cau=
se
> > for
> > > >>> the hotspot.
> > > >>>
> > > >>> Saad:
> > > >>> Can you pastebin stack trace of the hot region server when this
> > happens
> > > >>> again ?
> > > >>>
> > > >>> Thanks
> > > >>>
> > > >>>> On Dec 2, 2016, at 4:48 AM, Saad Mufti <saad.mufti@gmail.com>
> > wrote:
> > > >>>>
> > > >>>> We used a pre-split into 1024 regions at the start but we
> > > miscalculated
> > > >>> our
> > > >>>> data size, so there were still auto-splits storms at the beginni=
ng
> > as
> > > >>> data
> > > >>>> size stabilized, it has ended up at around 9500 or so regions,
> plus
> > a
> > > >>> few
> > > >>>> thousand regions for a few other tables (much smaller). But
> haven't
> > > had
> > > >>> any
> > > >>>> new auto-splits in a couple of months. And the hotspots only
> started
> > > >>>> happening recently.
> > > >>>>
> > > >>>> Our hashing scheme is very simple, we take the MD5 of the key,
> then
> > > >>> form a
> > > >>>> 4 digit prefix based on the first two bytes of the MD5 normalize=
d
> to
> > > be
> > > >>>> within the range 0-1023 . I am fairly confident about this schem=
e
> > > >>>> especially since even during the hotspot we see no evidence so f=
ar
> > > that
> > > >>> any
> > > >>>> particular region is taking disproportionate traffic (based on
> > > Cloudera
> > > >>>> Manager per region charts on the hotspot server). Does that look
> > like
> > > a
> > > >>>> reasonable scheme to randomize which region any give key goes to=
?
> > And
> > > >>> the
> > > >>>> start of the hotspot doesn't seem to correspond to any region
> > > splitting
> > > >>> or
> > > >>>> moving from one server to another activity.
> > > >>>>
> > > >>>> Thanks.
> > > >>>>
> > > >>>> ----
> > > >>>> Saad
> > > >>>>
> > > >>>>
> > > >>>>> On Thu, Dec 1, 2016 at 3:32 PM, John Leach <
> > jleach@splicemachine.com
> > > >
> > > >>> wrote:
> > > >>>>>
> > > >>>>> Saad,
> > > >>>>>
> > > >>>>> Region move or split causes client connections to simultaneousl=
y
> > > >>> refresh
> > > >>>>> their meta.
> > > >>>>>
> > > >>>>> Key word is supposed.  We have seen meta hot spotting from time
> to
> > > time
> > > >>>>> and on different versions at Splice Machine.
> > > >>>>>
> > > >>>>> How confident are you in your hashing algorithm?
> > > >>>>>
> > > >>>>> Regards,
> > > >>>>> John Leach
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>> On Dec 1, 2016, at 2:25 PM, Saad Mufti <saad.mufti@gmail.com>
> > > wrote:
> > > >>>>>>
> > > >>>>>> No never thought about that. I just figured out how to locate
> the
> > > >>> server
> > > >>>>>> for that table after you mentioned it. We'll have to keep an e=
ye
> > on
> > > it
> > > >>>>> next
> > > >>>>>> time we have a hotspot to see if it coincides with the hotspot
> > > server.
> > > >>>>>>
> > > >>>>>> What would be the theory for how it could become a hotspot?
> Isn't
> > > the
> > > >>>>>> client supposed to cache it and only go back for a refresh if =
it
> > > hits
> > > >>> a
> > > >>>>>> region that is not in its expected location?
> > > >>>>>>
> > > >>>>>> ----
> > > >>>>>> Saad
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On Thu, Dec 1, 2016 at 2:56 PM, John Leach <
> > > jleach@splicemachine.com>
> > > >>>>> wrote:
> > > >>>>>>
> > > >>>>>>> Saad,
> > > >>>>>>>
> > > >>>>>>> Did you validate that Meta is not on the =E2=80=9CHot=E2=80=
=9D region server?
> > > >>>>>>>
> > > >>>>>>> Regards,
> > > >>>>>>> John Leach
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>> On Dec 1, 2016, at 1:50 PM, Saad Mufti <saad.mufti@gmail.com=
>
> > > >>> wrote:
> > > >>>>>>>>
> > > >>>>>>>> Hi,
> > > >>>>>>>>
> > > >>>>>>>> We are using HBase 1.0 on CDH 5.5.2 . We have taken great ca=
re
> > to
> > > >>> avoid
> > > >>>>>>>> hotspotting due to inadvertent data patterns by prepending a=
n
> > MD5
> > > >>>>> based 4
> > > >>>>>>>> digit hash prefix to all our data keys. This works fine most
> of
> > > the
> > > >>>>>>> times,
> > > >>>>>>>> but more and more (as much as once or twice a day) recently =
we
> > > have
> > > >>>>>>>> occasions where one region server suddenly becomes "hot" (CP=
U
> > > above
> > > >>> or
> > > >>>>>>>> around 95% in various monitoring tools). When it happens it
> > lasts
> > > >>> for
> > > >>>>>>>> hours, occasionally the hotspot might jump to another region
> > > server
> > > >>> as
> > > >>>>>>> the
> > > >>>>>>>> master decide the region is unresponsive and gives its regio=
n
> to
> > > >>>>> another
> > > >>>>>>>> server.
> > > >>>>>>>>
> > > >>>>>>>> For the longest time, we thought this must be some single
> rogue
> > > key
> > > >>> in
> > > >>>>>>> our
> > > >>>>>>>> input data that is being hammered. All attempts to track thi=
s
> > down
> > > >>> have
> > > >>>>>>>> failed though, and the following behavior argues against thi=
s
> > > being
> > > >>>>>>>> application based:
> > > >>>>>>>>
> > > >>>>>>>> 1. plotted Get and Put rate by region on the "hot" region
> server
> > > in
> > > >>>>>>>> Cloudera Manager Charts, shows no single region is an outlie=
r.
> > > >>>>>>>>
> > > >>>>>>>> 2. cleanly restarting just the region server process causes
> its
> > > >>> regions
> > > >>>>>>> to
> > > >>>>>>>> randomly migrate to other region servers, then it gets new
> ones
> > > from
> > > >>>>> the
> > > >>>>>>>> HBase master, basically a sort of shuffling, then the hotspo=
t
> > goes
> > > >>>>> away.
> > > >>>>>>> If
> > > >>>>>>>> it were application based, you'd expect the hotspot to just
> jump
> > > to
> > > >>>>>>> another
> > > >>>>>>>> region server.
> > > >>>>>>>>
> > > >>>>>>>> 3. have pored through region server logs and can't see
> anything
> > > out
> > > >>> of
> > > >>>>>>> the
> > > >>>>>>>> ordinary happening
> > > >>>>>>>>
> > > >>>>>>>> The only other pertinent thing to mention might be that we
> have
> > a
> > > >>>>> special
> > > >>>>>>>> process of our own running outside the cluster that does
> cluster
> > > >>> wide
> > > >>>>>>> major
> > > >>>>>>>> compaction in a rolling fashion, where each batch consists o=
f
> > one
> > > >>>>> region
> > > >>>>>>>> from each region server, and it waits before one batch is
> > > completely
> > > >>>>> done
> > > >>>>>>>> before starting another. We have seen no real impact on the
> > > hotspot
> > > >>>>> from
> > > >>>>>>>> shutting this down and in normal times it doesn't impact our
> > read
> > > or
> > > >>>>>>> write
> > > >>>>>>>> performance much.
> > > >>>>>>>>
> > > >>>>>>>> We are at our wit's end, anyone have experience with a
> scenario
> > > like
> > > >>>>>>> this?
> > > >>>>>>>> Any help/guidance would be most appreciated.
> > > >>>>>>>>
> > > >>>>>>>> -----
> > > >>>>>>>> Saad
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>
> > > >>
> > > >>
> > >
> > >
> >
> >
> >
>

--001a113a22ce4b498f0542c2dd2a--