hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saad Mufti <saad.mu...@gmail.com>
Subject Re: Hot Region Server With No Hot Region
Date Fri, 02 Dec 2016 19:52:43 GMT
We're in AWS with D2.4xLarge instances. Each instance has 12 independent
spindles/disks from what I can tell.

We have charted get_rate and mutate_rate by host and

a) mutate_rate shows no real outliers
b) read_rate shows the overall rate on the "hotspot" region server is a bit
higher than every other server, not severely but enough that it is a bit
noticeable. But when we chart get_rate on that server by region, no one
region stands out.

get_rate chart by host:

https://snag.gy/hmoiDw.jpg

mutate_rate chart by host:

https://snag.gy/jitdMN.jpg

----
Saad


----
Saad


On Fri, Dec 2, 2016 at 2:34 PM, John Leach <jleach@splicemachine.com> wrote:

> Here is what I see...
>
>
> * Short Compaction Running on Heap
> "regionserver/ip-10-99-181-146.aolp-prd.us-east-1.ec2.
> aolcloud.net/10.99.181.146:60020-shortCompactions-1480229281547" - Thread
> t@242
>    java.lang.Thread.State: RUNNABLE
>     at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.
> compressSingleKeyValue(FastDiffDeltaEncoder.java:270)
>     at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.
> internalEncode(FastDiffDeltaEncoder.java:245)
>     at org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder.
> encode(BufferedDataBlockEncoder.java:987)
>     at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.
> encode(FastDiffDeltaEncoder.java:58)
>     at org.apache.hadoop.hbase.io.hfile.HFileDataBlockEncoderImpl.encode(
> HFileDataBlockEncoderImpl.java:97)
>     at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.write(
> HFileBlock.java:866)
>     at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(
> HFileWriterV2.java:270)
>     at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(
> HFileWriterV3.java:87)
>     at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.
> append(StoreFile.java:949)
>     at org.apache.hadoop.hbase.regionserver.compactions.
> Compactor.performCompaction(Compactor.java:282)
>     at org.apache.hadoop.hbase.regionserver.compactions.
> DefaultCompactor.compact(DefaultCompactor.java:105)
>     at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$
> DefaultCompactionContext.compact(DefaultStoreEngine.java:124)
>     at org.apache.hadoop.hbase.regionserver.HStore.compact(
> HStore.java:1233)
>     at org.apache.hadoop.hbase.regionserver.HRegion.compact(
> HRegion.java:1770)
>     at org.apache.hadoop.hbase.regionserver.CompactSplitThread$
> CompactionRunner.run(CompactSplitThread.java:520)
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
>
>
> * WAL Syncs waiting…   ALL 5
> "sync.0" - Thread t@202
>    java.lang.Thread.State: TIMED_WAITING
>     at java.lang.Object.wait(Native Method)
>     - waiting on <67ba892d> (a java.util.LinkedList)
>     at org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(
> DFSOutputStream.java:2337)
>     at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(
> DFSOutputStream.java:2224)
>     at org.apache.hadoop.hdfs.DFSOutputStream.hflush(
> DFSOutputStream.java:2116)
>     at org.apache.hadoop.fs.FSDataOutputStream.hflush(
> FSDataOutputStream.java:130)
>     at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(
> ProtobufLogWriter.java:173)
>     at org.apache.hadoop.hbase.regionserver.wal.FSHLog$
> SyncRunner.run(FSHLog.java:1379)
>     at java.lang.Thread.run(Thread.java:745)
>
> * Mutations backing up very badly...
>
> "B.defaultRpcServer.handler=103,queue=7,port=60020" - Thread t@155
>    java.lang.Thread.State: TIMED_WAITING
>     at java.lang.Object.wait(Native Method)
>     - waiting on <6ab54ea3> (a org.apache.hadoop.hbase.
> regionserver.wal.SyncFuture)
>     at org.apache.hadoop.hbase.regionserver.wal.SyncFuture.
> get(SyncFuture.java:167)
>     at org.apache.hadoop.hbase.regionserver.wal.FSHLog.
> blockOnSync(FSHLog.java:1504)
>     at org.apache.hadoop.hbase.regionserver.wal.FSHLog.
> publishSyncThenBlockOnCompletion(FSHLog.java:1498)
>     at org.apache.hadoop.hbase.regionserver.wal.FSHLog.sync(
> FSHLog.java:1632)
>     at org.apache.hadoop.hbase.regionserver.HRegion.
> syncOrDefer(HRegion.java:7737)
>     at org.apache.hadoop.hbase.regionserver.HRegion.
> processRowsWithLocks(HRegion.java:6504)
>     at org.apache.hadoop.hbase.regionserver.HRegion.
> mutateRowsWithLocks(HRegion.java:6352)
>     at org.apache.hadoop.hbase.regionserver.HRegion.
> mutateRowsWithLocks(HRegion.java:6334)
>     at org.apache.hadoop.hbase.regionserver.HRegion.
> mutateRow(HRegion.java:6325)
>     at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> mutateRows(RSRpcServices.java:418)
>     at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> multi(RSRpcServices.java:1916)
>     at org.apache.hadoop.hbase.protobuf.generated.
> ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213)
>     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2034)
>     at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
>     at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
> RpcExecutor.java:130)
>     at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>     at java.lang.Thread.run(Thread.java:745)
>
>
> Too many writers being blocked attempting to write to WAL.
>
> What does your disk infrastructure look like?  Can you get away with
> Multi-wal?  Ugh...
>
> Regards,
> John Leach
>
>
> > On Dec 2, 2016, at 1:20 PM, Saad Mufti <saad.mufti@gmail.com> wrote:
> >
> > Hi Ted,
> >
> > Finally we have another hotspot going on, same symptoms as before, here
> is
> > the pastebin for the stack trace from the region server that I obtained
> via
> > VisualVM:
> >
> > http://pastebin.com/qbXPPrXk
> >
> > Would really appreciate any insight you or anyone else can provide.
> >
> > Thanks.
> >
> > ----
> > Saad
> >
> >
> > On Thu, Dec 1, 2016 at 6:08 PM, Saad Mufti <saad.mufti@gmail.com> wrote:
> >
> >> Sure will, the next time it happens.
> >>
> >> Thanks!!!
> >>
> >> ----
> >> Saad
> >>
> >>
> >> On Thu, Dec 1, 2016 at 5:01 PM, Ted Yu <ted_yu@yahoo.com.invalid>
> wrote:
> >>
> >>> From #2 in the initial email, the hbase:meta might not be the cause for
> >>> the hotspot.
> >>>
> >>> Saad:
> >>> Can you pastebin stack trace of the hot region server when this happens
> >>> again ?
> >>>
> >>> Thanks
> >>>
> >>>> On Dec 2, 2016, at 4:48 AM, Saad Mufti <saad.mufti@gmail.com>
wrote:
> >>>>
> >>>> We used a pre-split into 1024 regions at the start but we
> miscalculated
> >>> our
> >>>> data size, so there were still auto-splits storms at the beginning as
> >>> data
> >>>> size stabilized, it has ended up at around 9500 or so regions, plus
a
> >>> few
> >>>> thousand regions for a few other tables (much smaller). But haven't
> had
> >>> any
> >>>> new auto-splits in a couple of months. And the hotspots only started
> >>>> happening recently.
> >>>>
> >>>> Our hashing scheme is very simple, we take the MD5 of the key, then
> >>> form a
> >>>> 4 digit prefix based on the first two bytes of the MD5 normalized to
> be
> >>>> within the range 0-1023 . I am fairly confident about this scheme
> >>>> especially since even during the hotspot we see no evidence so far
> that
> >>> any
> >>>> particular region is taking disproportionate traffic (based on
> Cloudera
> >>>> Manager per region charts on the hotspot server). Does that look like
> a
> >>>> reasonable scheme to randomize which region any give key goes to? And
> >>> the
> >>>> start of the hotspot doesn't seem to correspond to any region
> splitting
> >>> or
> >>>> moving from one server to another activity.
> >>>>
> >>>> Thanks.
> >>>>
> >>>> ----
> >>>> Saad
> >>>>
> >>>>
> >>>>> On Thu, Dec 1, 2016 at 3:32 PM, John Leach <jleach@splicemachine.com
> >
> >>> wrote:
> >>>>>
> >>>>> Saad,
> >>>>>
> >>>>> Region move or split causes client connections to simultaneously
> >>> refresh
> >>>>> their meta.
> >>>>>
> >>>>> Key word is supposed.  We have seen meta hot spotting from time
to
> time
> >>>>> and on different versions at Splice Machine.
> >>>>>
> >>>>> How confident are you in your hashing algorithm?
> >>>>>
> >>>>> Regards,
> >>>>> John Leach
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On Dec 1, 2016, at 2:25 PM, Saad Mufti <saad.mufti@gmail.com>
> wrote:
> >>>>>>
> >>>>>> No never thought about that. I just figured out how to locate
the
> >>> server
> >>>>>> for that table after you mentioned it. We'll have to keep an
eye on
> it
> >>>>> next
> >>>>>> time we have a hotspot to see if it coincides with the hotspot
> server.
> >>>>>>
> >>>>>> What would be the theory for how it could become a hotspot?
Isn't
> the
> >>>>>> client supposed to cache it and only go back for a refresh if
it
> hits
> >>> a
> >>>>>> region that is not in its expected location?
> >>>>>>
> >>>>>> ----
> >>>>>> Saad
> >>>>>>
> >>>>>>
> >>>>>> On Thu, Dec 1, 2016 at 2:56 PM, John Leach <
> jleach@splicemachine.com>
> >>>>> wrote:
> >>>>>>
> >>>>>>> Saad,
> >>>>>>>
> >>>>>>> Did you validate that Meta is not on the “Hot” region
server?
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> John Leach
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> On Dec 1, 2016, at 1:50 PM, Saad Mufti <saad.mufti@gmail.com>
> >>> wrote:
> >>>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> We are using HBase 1.0 on CDH 5.5.2 . We have taken
great care to
> >>> avoid
> >>>>>>>> hotspotting due to inadvertent data patterns by prepending
an MD5
> >>>>> based 4
> >>>>>>>> digit hash prefix to all our data keys. This works fine
most of
> the
> >>>>>>> times,
> >>>>>>>> but more and more (as much as once or twice a day) recently
we
> have
> >>>>>>>> occasions where one region server suddenly becomes "hot"
(CPU
> above
> >>> or
> >>>>>>>> around 95% in various monitoring tools). When it happens
it lasts
> >>> for
> >>>>>>>> hours, occasionally the hotspot might jump to another
region
> server
> >>> as
> >>>>>>> the
> >>>>>>>> master decide the region is unresponsive and gives its
region to
> >>>>> another
> >>>>>>>> server.
> >>>>>>>>
> >>>>>>>> For the longest time, we thought this must be some single
rogue
> key
> >>> in
> >>>>>>> our
> >>>>>>>> input data that is being hammered. All attempts to track
this down
> >>> have
> >>>>>>>> failed though, and the following behavior argues against
this
> being
> >>>>>>>> application based:
> >>>>>>>>
> >>>>>>>> 1. plotted Get and Put rate by region on the "hot" region
server
> in
> >>>>>>>> Cloudera Manager Charts, shows no single region is an
outlier.
> >>>>>>>>
> >>>>>>>> 2. cleanly restarting just the region server process
causes its
> >>> regions
> >>>>>>> to
> >>>>>>>> randomly migrate to other region servers, then it gets
new ones
> from
> >>>>> the
> >>>>>>>> HBase master, basically a sort of shuffling, then the
hotspot goes
> >>>>> away.
> >>>>>>> If
> >>>>>>>> it were application based, you'd expect the hotspot
to just jump
> to
> >>>>>>> another
> >>>>>>>> region server.
> >>>>>>>>
> >>>>>>>> 3. have pored through region server logs and can't see
anything
> out
> >>> of
> >>>>>>> the
> >>>>>>>> ordinary happening
> >>>>>>>>
> >>>>>>>> The only other pertinent thing to mention might be that
we have a
> >>>>> special
> >>>>>>>> process of our own running outside the cluster that
does cluster
> >>> wide
> >>>>>>> major
> >>>>>>>> compaction in a rolling fashion, where each batch consists
of one
> >>>>> region
> >>>>>>>> from each region server, and it waits before one batch
is
> completely
> >>>>> done
> >>>>>>>> before starting another. We have seen no real impact
on the
> hotspot
> >>>>> from
> >>>>>>>> shutting this down and in normal times it doesn't impact
our read
> or
> >>>>>>> write
> >>>>>>>> performance much.
> >>>>>>>>
> >>>>>>>> We are at our wit's end, anyone have experience with
a scenario
> like
> >>>>>>> this?
> >>>>>>>> Any help/guidance would be most appreciated.
> >>>>>>>>
> >>>>>>>> -----
> >>>>>>>> Saad
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message