hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saad Mufti <saad.mu...@gmail.com>
Subject Re: Hot Region Server With No Hot Region
Date Sat, 03 Dec 2016 14:19:42 GMT
No.

----
Saad


On Fri, Dec 2, 2016 at 3:27 PM, Ted Yu <ted_yu@yahoo.com.invalid> wrote:

> Some how I couldn't access the pastebin (I am in China now).
> Did the region server showing hotspot host meta ?
> Thanks
>
>     On Friday, December 2, 2016 11:53 AM, Saad Mufti <saad.mufti@gmail.com>
> wrote:
>
>
>  We're in AWS with D2.4xLarge instances. Each instance has 12 independent
> spindles/disks from what I can tell.
>
> We have charted get_rate and mutate_rate by host and
>
> a) mutate_rate shows no real outliers
> b) read_rate shows the overall rate on the "hotspot" region server is a bit
> higher than every other server, not severely but enough that it is a bit
> noticeable. But when we chart get_rate on that server by region, no one
> region stands out.
>
> get_rate chart by host:
>
> https://snag.gy/hmoiDw.jpg
>
> mutate_rate chart by host:
>
> https://snag.gy/jitdMN.jpg
>
> ----
> Saad
>
>
> ----
> Saad
>
>
> On Fri, Dec 2, 2016 at 2:34 PM, John Leach <jleach@splicemachine.com>
> wrote:
>
> > Here is what I see...
> >
> >
> > * Short Compaction Running on Heap
> > "regionserver/ip-10-99-181-146.aolp-prd.us-east-1.ec2.
> > aolcloud.net/10.99.181.146:60020-shortCompactions-1480229281547" -
> Thread
> > t@242
> >    java.lang.Thread.State: RUNNABLE
> >    at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.
> > compressSingleKeyValue(FastDiffDeltaEncoder.java:270)
> >    at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.
> > internalEncode(FastDiffDeltaEncoder.java:245)
> >    at org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder.
> > encode(BufferedDataBlockEncoder.java:987)
> >    at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.
> > encode(FastDiffDeltaEncoder.java:58)
> >    at org.apache.hadoop.hbase.io.hfile.HFileDataBlockEncoderImpl.encode(
> > HFileDataBlockEncoderImpl.java:97)
> >    at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.write(
> > HFileBlock.java:866)
> >    at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(
> > HFileWriterV2.java:270)
> >    at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(
> > HFileWriterV3.java:87)
> >    at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.
> > append(StoreFile.java:949)
> >    at org.apache.hadoop.hbase.regionserver.compactions.
> > Compactor.performCompaction(Compactor.java:282)
> >    at org.apache.hadoop.hbase.regionserver.compactions.
> > DefaultCompactor.compact(DefaultCompactor.java:105)
> >    at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$
> > DefaultCompactionContext.compact(DefaultStoreEngine.java:124)
> >    at org.apache.hadoop.hbase.regionserver.HStore.compact(
> > HStore.java:1233)
> >    at org.apache.hadoop.hbase.regionserver.HRegion.compact(
> > HRegion.java:1770)
> >    at org.apache.hadoop.hbase.regionserver.CompactSplitThread$
> > CompactionRunner.run(CompactSplitThread.java:520)
> >    at java.util.concurrent.ThreadPoolExecutor.runWorker(
> > ThreadPoolExecutor.java:1142)
> >    at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > ThreadPoolExecutor.java:617)
> >    at java.lang.Thread.run(Thread.java:745)
> >
> >
> > * WAL Syncs waiting…  ALL 5
> > "sync.0" - Thread t@202
> >    java.lang.Thread.State: TIMED_WAITING
> >    at java.lang.Object.wait(Native Method)
> >    - waiting on <67ba892d> (a java.util.LinkedList)
> >    at org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(
> > DFSOutputStream.java:2337)
> >    at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(
> > DFSOutputStream.java:2224)
> >    at org.apache.hadoop.hdfs.DFSOutputStream.hflush(
> > DFSOutputStream.java:2116)
> >    at org.apache.hadoop.fs.FSDataOutputStream.hflush(
> > FSDataOutputStream.java:130)
> >    at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(
> > ProtobufLogWriter.java:173)
> >    at org.apache.hadoop.hbase.regionserver.wal.FSHLog$
> > SyncRunner.run(FSHLog.java:1379)
> >    at java.lang.Thread.run(Thread.java:745)
> >
> > * Mutations backing up very badly...
> >
> > "B.defaultRpcServer.handler=103,queue=7,port=60020" - Thread t@155
> >    java.lang.Thread.State: TIMED_WAITING
> >    at java.lang.Object.wait(Native Method)
> >    - waiting on <6ab54ea3> (a org.apache.hadoop.hbase.
> > regionserver.wal.SyncFuture)
> >    at org.apache.hadoop.hbase.regionserver.wal.SyncFuture.
> > get(SyncFuture.java:167)
> >    at org.apache.hadoop.hbase.regionserver.wal.FSHLog.
> > blockOnSync(FSHLog.java:1504)
> >    at org.apache.hadoop.hbase.regionserver.wal.FSHLog.
> > publishSyncThenBlockOnCompletion(FSHLog.java:1498)
> >    at org.apache.hadoop.hbase.regionserver.wal.FSHLog.sync(
> > FSHLog.java:1632)
> >    at org.apache.hadoop.hbase.regionserver.HRegion.
> > syncOrDefer(HRegion.java:7737)
> >    at org.apache.hadoop.hbase.regionserver.HRegion.
> > processRowsWithLocks(HRegion.java:6504)
> >    at org.apache.hadoop.hbase.regionserver.HRegion.
> > mutateRowsWithLocks(HRegion.java:6352)
> >    at org.apache.hadoop.hbase.regionserver.HRegion.
> > mutateRowsWithLocks(HRegion.java:6334)
> >    at org.apache.hadoop.hbase.regionserver.HRegion.
> > mutateRow(HRegion.java:6325)
> >    at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> > mutateRows(RSRpcServices.java:418)
> >    at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> > multi(RSRpcServices.java:1916)
> >    at org.apache.hadoop.hbase.protobuf.generated.
> > ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213)
> >    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2034)
> >    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> >    at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
> > RpcExecutor.java:130)
> >    at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.
> java:107)
> >    at java.lang.Thread.run(Thread.java:745)
> >
> >
> > Too many writers being blocked attempting to write to WAL.
> >
> > What does your disk infrastructure look like?  Can you get away with
> > Multi-wal?  Ugh...
> >
> > Regards,
> > John Leach
> >
> >
> > > On Dec 2, 2016, at 1:20 PM, Saad Mufti <saad.mufti@gmail.com> wrote:
> > >
> > > Hi Ted,
> > >
> > > Finally we have another hotspot going on, same symptoms as before, here
> > is
> > > the pastebin for the stack trace from the region server that I obtained
> > via
> > > VisualVM:
> > >
> > > http://pastebin.com/qbXPPrXk
> > >
> > > Would really appreciate any insight you or anyone else can provide.
> > >
> > > Thanks.
> > >
> > > ----
> > > Saad
> > >
> > >
> > > On Thu, Dec 1, 2016 at 6:08 PM, Saad Mufti <saad.mufti@gmail.com>
> wrote:
> > >
> > >> Sure will, the next time it happens.
> > >>
> > >> Thanks!!!
> > >>
> > >> ----
> > >> Saad
> > >>
> > >>
> > >> On Thu, Dec 1, 2016 at 5:01 PM, Ted Yu <ted_yu@yahoo.com.invalid>
> > wrote:
> > >>
> > >>> From #2 in the initial email, the hbase:meta might not be the cause
> for
> > >>> the hotspot.
> > >>>
> > >>> Saad:
> > >>> Can you pastebin stack trace of the hot region server when this
> happens
> > >>> again ?
> > >>>
> > >>> Thanks
> > >>>
> > >>>> On Dec 2, 2016, at 4:48 AM, Saad Mufti <saad.mufti@gmail.com>
> wrote:
> > >>>>
> > >>>> We used a pre-split into 1024 regions at the start but we
> > miscalculated
> > >>> our
> > >>>> data size, so there were still auto-splits storms at the beginning
> as
> > >>> data
> > >>>> size stabilized, it has ended up at around 9500 or so regions,
plus
> a
> > >>> few
> > >>>> thousand regions for a few other tables (much smaller). But haven't
> > had
> > >>> any
> > >>>> new auto-splits in a couple of months. And the hotspots only started
> > >>>> happening recently.
> > >>>>
> > >>>> Our hashing scheme is very simple, we take the MD5 of the key,
then
> > >>> form a
> > >>>> 4 digit prefix based on the first two bytes of the MD5 normalized
to
> > be
> > >>>> within the range 0-1023 . I am fairly confident about this scheme
> > >>>> especially since even during the hotspot we see no evidence so
far
> > that
> > >>> any
> > >>>> particular region is taking disproportionate traffic (based on
> > Cloudera
> > >>>> Manager per region charts on the hotspot server). Does that look
> like
> > a
> > >>>> reasonable scheme to randomize which region any give key goes to?
> And
> > >>> the
> > >>>> start of the hotspot doesn't seem to correspond to any region
> > splitting
> > >>> or
> > >>>> moving from one server to another activity.
> > >>>>
> > >>>> Thanks.
> > >>>>
> > >>>> ----
> > >>>> Saad
> > >>>>
> > >>>>
> > >>>>> On Thu, Dec 1, 2016 at 3:32 PM, John Leach <
> jleach@splicemachine.com
> > >
> > >>> wrote:
> > >>>>>
> > >>>>> Saad,
> > >>>>>
> > >>>>> Region move or split causes client connections to simultaneously
> > >>> refresh
> > >>>>> their meta.
> > >>>>>
> > >>>>> Key word is supposed.  We have seen meta hot spotting from
time to
> > time
> > >>>>> and on different versions at Splice Machine.
> > >>>>>
> > >>>>> How confident are you in your hashing algorithm?
> > >>>>>
> > >>>>> Regards,
> > >>>>> John Leach
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>> On Dec 1, 2016, at 2:25 PM, Saad Mufti <saad.mufti@gmail.com>
> > wrote:
> > >>>>>>
> > >>>>>> No never thought about that. I just figured out how to
locate the
> > >>> server
> > >>>>>> for that table after you mentioned it. We'll have to keep
an eye
> on
> > it
> > >>>>> next
> > >>>>>> time we have a hotspot to see if it coincides with the
hotspot
> > server.
> > >>>>>>
> > >>>>>> What would be the theory for how it could become a hotspot?
Isn't
> > the
> > >>>>>> client supposed to cache it and only go back for a refresh
if it
> > hits
> > >>> a
> > >>>>>> region that is not in its expected location?
> > >>>>>>
> > >>>>>> ----
> > >>>>>> Saad
> > >>>>>>
> > >>>>>>
> > >>>>>> On Thu, Dec 1, 2016 at 2:56 PM, John Leach <
> > jleach@splicemachine.com>
> > >>>>> wrote:
> > >>>>>>
> > >>>>>>> Saad,
> > >>>>>>>
> > >>>>>>> Did you validate that Meta is not on the “Hot”
region server?
> > >>>>>>>
> > >>>>>>> Regards,
> > >>>>>>> John Leach
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> On Dec 1, 2016, at 1:50 PM, Saad Mufti <saad.mufti@gmail.com>
> > >>> wrote:
> > >>>>>>>>
> > >>>>>>>> Hi,
> > >>>>>>>>
> > >>>>>>>> We are using HBase 1.0 on CDH 5.5.2 . We have taken
great care
> to
> > >>> avoid
> > >>>>>>>> hotspotting due to inadvertent data patterns by
prepending an
> MD5
> > >>>>> based 4
> > >>>>>>>> digit hash prefix to all our data keys. This works
fine most of
> > the
> > >>>>>>> times,
> > >>>>>>>> but more and more (as much as once or twice a day)
recently we
> > have
> > >>>>>>>> occasions where one region server suddenly becomes
"hot" (CPU
> > above
> > >>> or
> > >>>>>>>> around 95% in various monitoring tools). When it
happens it
> lasts
> > >>> for
> > >>>>>>>> hours, occasionally the hotspot might jump to another
region
> > server
> > >>> as
> > >>>>>>> the
> > >>>>>>>> master decide the region is unresponsive and gives
its region to
> > >>>>> another
> > >>>>>>>> server.
> > >>>>>>>>
> > >>>>>>>> For the longest time, we thought this must be some
single rogue
> > key
> > >>> in
> > >>>>>>> our
> > >>>>>>>> input data that is being hammered. All attempts
to track this
> down
> > >>> have
> > >>>>>>>> failed though, and the following behavior argues
against this
> > being
> > >>>>>>>> application based:
> > >>>>>>>>
> > >>>>>>>> 1. plotted Get and Put rate by region on the "hot"
region server
> > in
> > >>>>>>>> Cloudera Manager Charts, shows no single region
is an outlier.
> > >>>>>>>>
> > >>>>>>>> 2. cleanly restarting just the region server process
causes its
> > >>> regions
> > >>>>>>> to
> > >>>>>>>> randomly migrate to other region servers, then
it gets new ones
> > from
> > >>>>> the
> > >>>>>>>> HBase master, basically a sort of shuffling, then
the hotspot
> goes
> > >>>>> away.
> > >>>>>>> If
> > >>>>>>>> it were application based, you'd expect the hotspot
to just jump
> > to
> > >>>>>>> another
> > >>>>>>>> region server.
> > >>>>>>>>
> > >>>>>>>> 3. have pored through region server logs and can't
see anything
> > out
> > >>> of
> > >>>>>>> the
> > >>>>>>>> ordinary happening
> > >>>>>>>>
> > >>>>>>>> The only other pertinent thing to mention might
be that we have
> a
> > >>>>> special
> > >>>>>>>> process of our own running outside the cluster
that does cluster
> > >>> wide
> > >>>>>>> major
> > >>>>>>>> compaction in a rolling fashion, where each batch
consists of
> one
> > >>>>> region
> > >>>>>>>> from each region server, and it waits before one
batch is
> > completely
> > >>>>> done
> > >>>>>>>> before starting another. We have seen no real impact
on the
> > hotspot
> > >>>>> from
> > >>>>>>>> shutting this down and in normal times it doesn't
impact our
> read
> > or
> > >>>>>>> write
> > >>>>>>>> performance much.
> > >>>>>>>>
> > >>>>>>>> We are at our wit's end, anyone have experience
with a scenario
> > like
> > >>>>>>> this?
> > >>>>>>>> Any help/guidance would be most appreciated.
> > >>>>>>>>
> > >>>>>>>> -----
> > >>>>>>>> Saad
> > >>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>>
> > >>>
> > >>
> > >>
> >
> >
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message