hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saad Mufti <saad.mu...@gmail.com>
Subject Re: Hot Region Server With No Hot Region
Date Tue, 13 Dec 2016 20:47:04 GMT
Thanks everyone for the feedback. We tracked this down to having a bad
design using dynamic columns, there were a few (very few) rows that
accumulated up to 200,000 dynamic columns. When we got any activity that
caused us to try to read one of these rows, it resulted in a hot region
server.

Follow up question, we are now in the process of cleaning up those rows as
identified, but but some are so big that trying to read them in the cleanup
process kills it with out of memory exceptions. Is there any way to
identify rows with too many columns without actually reading them all?

Thanks.

----
Saad


On Sat, Dec 3, 2016 at 3:20 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> I took a look at the stack trace.
>
> Region server log would give us more detail on the frequency and duration
> of compactions.
>
> Cheers
>
> On Sat, Dec 3, 2016 at 7:39 AM, Jeremy Carroll <phobos182@gmail.com>
> wrote:
>
> > I would check compaction, investigate throttling if it's causing high
> CPU.
> >
> > On Sat, Dec 3, 2016 at 6:20 AM Saad Mufti <saad.mufti@gmail.com> wrote:
> >
> > > No.
> > >
> > > ----
> > > Saad
> > >
> > >
> > > On Fri, Dec 2, 2016 at 3:27 PM, Ted Yu <ted_yu@yahoo.com.invalid>
> wrote:
> > >
> > > > Some how I couldn't access the pastebin (I am in China now).
> > > > Did the region server showing hotspot host meta ?
> > > > Thanks
> > > >
> > > >     On Friday, December 2, 2016 11:53 AM, Saad Mufti <
> > > saad.mufti@gmail.com>
> > > > wrote:
> > > >
> > > >
> > > >  We're in AWS with D2.4xLarge instances. Each instance has 12
> > independent
> > > > spindles/disks from what I can tell.
> > > >
> > > > We have charted get_rate and mutate_rate by host and
> > > >
> > > > a) mutate_rate shows no real outliers
> > > > b) read_rate shows the overall rate on the "hotspot" region server
> is a
> > > bit
> > > > higher than every other server, not severely but enough that it is a
> > bit
> > > > noticeable. But when we chart get_rate on that server by region, no
> one
> > > > region stands out.
> > > >
> > > > get_rate chart by host:
> > > >
> > > > https://snag.gy/hmoiDw.jpg
> > > >
> > > > mutate_rate chart by host:
> > > >
> > > > https://snag.gy/jitdMN.jpg
> > > >
> > > > ----
> > > > Saad
> > > >
> > > >
> > > > ----
> > > > Saad
> > > >
> > > >
> > > > On Fri, Dec 2, 2016 at 2:34 PM, John Leach <jleach@splicemachine.com
> >
> > > > wrote:
> > > >
> > > > > Here is what I see...
> > > > >
> > > > >
> > > > > * Short Compaction Running on Heap
> > > > > "regionserver/ip-10-99-181-146.aolp-prd.us-east-1.ec2.
> > > > > aolcloud.net/10.99.181.146:60020-shortCompactions-1480229281547"
-
> > > > Thread
> > > > > t@242
> > > > >    java.lang.Thread.State: RUNNABLE
> > > > >    at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.
> > > > > compressSingleKeyValue(FastDiffDeltaEncoder.java:270)
> > > > >    at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.
> > > > > internalEncode(FastDiffDeltaEncoder.java:245)
> > > > >    at org.apache.hadoop.hbase.io.encoding.
> BufferedDataBlockEncoder.
> > > > > encode(BufferedDataBlockEncoder.java:987)
> > > > >    at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.
> > > > > encode(FastDiffDeltaEncoder.java:58)
> > > > >    at org.apache.hadoop.hbase.io
> > > .hfile.HFileDataBlockEncoderImpl.encode(
> > > > > HFileDataBlockEncoderImpl.java:97)
> > > > >    at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.write(
> > > > > HFileBlock.java:866)
> > > > >    at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(
> > > > > HFileWriterV2.java:270)
> > > > >    at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(
> > > > > HFileWriterV3.java:87)
> > > > >    at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.
> > > > > append(StoreFile.java:949)
> > > > >    at org.apache.hadoop.hbase.regionserver.compactions.
> > > > > Compactor.performCompaction(Compactor.java:282)
> > > > >    at org.apache.hadoop.hbase.regionserver.compactions.
> > > > > DefaultCompactor.compact(DefaultCompactor.java:105)
> > > > >    at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$
> > > > > DefaultCompactionContext.compact(DefaultStoreEngine.java:124)
> > > > >    at org.apache.hadoop.hbase.regionserver.HStore.compact(
> > > > > HStore.java:1233)
> > > > >    at org.apache.hadoop.hbase.regionserver.HRegion.compact(
> > > > > HRegion.java:1770)
> > > > >    at org.apache.hadoop.hbase.regionserver.CompactSplitThread$
> > > > > CompactionRunner.run(CompactSplitThread.java:520)
> > > > >    at java.util.concurrent.ThreadPoolExecutor.runWorker(
> > > > > ThreadPoolExecutor.java:1142)
> > > > >    at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > > > > ThreadPoolExecutor.java:617)
> > > > >    at java.lang.Thread.run(Thread.java:745)
> > > > >
> > > > >
> > > > > * WAL Syncs waiting…  ALL 5
> > > > > "sync.0" - Thread t@202
> > > > >    java.lang.Thread.State: TIMED_WAITING
> > > > >    at java.lang.Object.wait(Native Method)
> > > > >    - waiting on <67ba892d> (a java.util.LinkedList)
> > > > >    at org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(
> > > > > DFSOutputStream.java:2337)
> > > > >    at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(
> > > > > DFSOutputStream.java:2224)
> > > > >    at org.apache.hadoop.hdfs.DFSOutputStream.hflush(
> > > > > DFSOutputStream.java:2116)
> > > > >    at org.apache.hadoop.fs.FSDataOutputStream.hflush(
> > > > > FSDataOutputStream.java:130)
> > > > >    at org.apache.hadoop.hbase.regionserver.wal.
> > ProtobufLogWriter.sync(
> > > > > ProtobufLogWriter.java:173)
> > > > >    at org.apache.hadoop.hbase.regionserver.wal.FSHLog$
> > > > > SyncRunner.run(FSHLog.java:1379)
> > > > >    at java.lang.Thread.run(Thread.java:745)
> > > > >
> > > > > * Mutations backing up very badly...
> > > > >
> > > > > "B.defaultRpcServer.handler=103,queue=7,port=60020" - Thread t@155
> > > > >    java.lang.Thread.State: TIMED_WAITING
> > > > >    at java.lang.Object.wait(Native Method)
> > > > >    - waiting on <6ab54ea3> (a org.apache.hadoop.hbase.
> > > > > regionserver.wal.SyncFuture)
> > > > >    at org.apache.hadoop.hbase.regionserver.wal.SyncFuture.
> > > > > get(SyncFuture.java:167)
> > > > >    at org.apache.hadoop.hbase.regionserver.wal.FSHLog.
> > > > > blockOnSync(FSHLog.java:1504)
> > > > >    at org.apache.hadoop.hbase.regionserver.wal.FSHLog.
> > > > > publishSyncThenBlockOnCompletion(FSHLog.java:1498)
> > > > >    at org.apache.hadoop.hbase.regionserver.wal.FSHLog.sync(
> > > > > FSHLog.java:1632)
> > > > >    at org.apache.hadoop.hbase.regionserver.HRegion.
> > > > > syncOrDefer(HRegion.java:7737)
> > > > >    at org.apache.hadoop.hbase.regionserver.HRegion.
> > > > > processRowsWithLocks(HRegion.java:6504)
> > > > >    at org.apache.hadoop.hbase.regionserver.HRegion.
> > > > > mutateRowsWithLocks(HRegion.java:6352)
> > > > >    at org.apache.hadoop.hbase.regionserver.HRegion.
> > > > > mutateRowsWithLocks(HRegion.java:6334)
> > > > >    at org.apache.hadoop.hbase.regionserver.HRegion.
> > > > > mutateRow(HRegion.java:6325)
> > > > >    at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> > > > > mutateRows(RSRpcServices.java:418)
> > > > >    at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> > > > > multi(RSRpcServices.java:1916)
> > > > >    at org.apache.hadoop.hbase.protobuf.generated.
> > > > >
> > > ClientProtos$ClientService$2.callBlockingMethod(
> ClientProtos.java:32213)
> > > > >    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:
> > 2034)
> > > > >    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.
> > java:107)
> > > > >    at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
> > > > > RpcExecutor.java:130)
> > > > >    at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.
> > > > java:107)
> > > > >    at java.lang.Thread.run(Thread.java:745)
> > > > >
> > > > >
> > > > > Too many writers being blocked attempting to write to WAL.
> > > > >
> > > > > What does your disk infrastructure look like?  Can you get away
> with
> > > > > Multi-wal?  Ugh...
> > > > >
> > > > > Regards,
> > > > > John Leach
> > > > >
> > > > >
> > > > > > On Dec 2, 2016, at 1:20 PM, Saad Mufti <saad.mufti@gmail.com>
> > wrote:
> > > > > >
> > > > > > Hi Ted,
> > > > > >
> > > > > > Finally we have another hotspot going on, same symptoms as
> before,
> > > here
> > > > > is
> > > > > > the pastebin for the stack trace from the region server that
I
> > > obtained
> > > > > via
> > > > > > VisualVM:
> > > > > >
> > > > > > http://pastebin.com/qbXPPrXk
> > > > > >
> > > > > > Would really appreciate any insight you or anyone else can
> provide.
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > ----
> > > > > > Saad
> > > > > >
> > > > > >
> > > > > > On Thu, Dec 1, 2016 at 6:08 PM, Saad Mufti <saad.mufti@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > >> Sure will, the next time it happens.
> > > > > >>
> > > > > >> Thanks!!!
> > > > > >>
> > > > > >> ----
> > > > > >> Saad
> > > > > >>
> > > > > >>
> > > > > >> On Thu, Dec 1, 2016 at 5:01 PM, Ted Yu <ted_yu@yahoo.com.invalid
> >
> > > > > wrote:
> > > > > >>
> > > > > >>> From #2 in the initial email, the hbase:meta might not
be the
> > cause
> > > > for
> > > > > >>> the hotspot.
> > > > > >>>
> > > > > >>> Saad:
> > > > > >>> Can you pastebin stack trace of the hot region server
when this
> > > > happens
> > > > > >>> again ?
> > > > > >>>
> > > > > >>> Thanks
> > > > > >>>
> > > > > >>>> On Dec 2, 2016, at 4:48 AM, Saad Mufti <saad.mufti@gmail.com>
> > > > wrote:
> > > > > >>>>
> > > > > >>>> We used a pre-split into 1024 regions at the start
but we
> > > > > miscalculated
> > > > > >>> our
> > > > > >>>> data size, so there were still auto-splits storms
at the
> > beginning
> > > > as
> > > > > >>> data
> > > > > >>>> size stabilized, it has ended up at around 9500
or so regions,
> > > plus
> > > > a
> > > > > >>> few
> > > > > >>>> thousand regions for a few other tables (much smaller).
But
> > > haven't
> > > > > had
> > > > > >>> any
> > > > > >>>> new auto-splits in a couple of months. And the hotspots
only
> > > started
> > > > > >>>> happening recently.
> > > > > >>>>
> > > > > >>>> Our hashing scheme is very simple, we take the MD5
of the key,
> > > then
> > > > > >>> form a
> > > > > >>>> 4 digit prefix based on the first two bytes of the
MD5
> > normalized
> > > to
> > > > > be
> > > > > >>>> within the range 0-1023 . I am fairly confident
about this
> > scheme
> > > > > >>>> especially since even during the hotspot we see
no evidence so
> > far
> > > > > that
> > > > > >>> any
> > > > > >>>> particular region is taking disproportionate traffic
(based on
> > > > > Cloudera
> > > > > >>>> Manager per region charts on the hotspot server).
Does that
> look
> > > > like
> > > > > a
> > > > > >>>> reasonable scheme to randomize which region any
give key goes
> > to?
> > > > And
> > > > > >>> the
> > > > > >>>> start of the hotspot doesn't seem to correspond
to any region
> > > > > splitting
> > > > > >>> or
> > > > > >>>> moving from one server to another activity.
> > > > > >>>>
> > > > > >>>> Thanks.
> > > > > >>>>
> > > > > >>>> ----
> > > > > >>>> Saad
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>> On Thu, Dec 1, 2016 at 3:32 PM, John Leach <
> > > > jleach@splicemachine.com
> > > > > >
> > > > > >>> wrote:
> > > > > >>>>>
> > > > > >>>>> Saad,
> > > > > >>>>>
> > > > > >>>>> Region move or split causes client connections
to
> > simultaneously
> > > > > >>> refresh
> > > > > >>>>> their meta.
> > > > > >>>>>
> > > > > >>>>> Key word is supposed.  We have seen meta hot
spotting from
> time
> > > to
> > > > > time
> > > > > >>>>> and on different versions at Splice Machine.
> > > > > >>>>>
> > > > > >>>>> How confident are you in your hashing algorithm?
> > > > > >>>>>
> > > > > >>>>> Regards,
> > > > > >>>>> John Leach
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>>> On Dec 1, 2016, at 2:25 PM, Saad Mufti <
> saad.mufti@gmail.com>
> > > > > wrote:
> > > > > >>>>>>
> > > > > >>>>>> No never thought about that. I just figured
out how to
> locate
> > > the
> > > > > >>> server
> > > > > >>>>>> for that table after you mentioned it. We'll
have to keep an
> > eye
> > > > on
> > > > > it
> > > > > >>>>> next
> > > > > >>>>>> time we have a hotspot to see if it coincides
with the
> hotspot
> > > > > server.
> > > > > >>>>>>
> > > > > >>>>>> What would be the theory for how it could
become a hotspot?
> > > Isn't
> > > > > the
> > > > > >>>>>> client supposed to cache it and only go
back for a refresh
> if
> > it
> > > > > hits
> > > > > >>> a
> > > > > >>>>>> region that is not in its expected location?
> > > > > >>>>>>
> > > > > >>>>>> ----
> > > > > >>>>>> Saad
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> On Thu, Dec 1, 2016 at 2:56 PM, John Leach
<
> > > > > jleach@splicemachine.com>
> > > > > >>>>> wrote:
> > > > > >>>>>>
> > > > > >>>>>>> Saad,
> > > > > >>>>>>>
> > > > > >>>>>>> Did you validate that Meta is not on
the “Hot” region
> server?
> > > > > >>>>>>>
> > > > > >>>>>>> Regards,
> > > > > >>>>>>> John Leach
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>>> On Dec 1, 2016, at 1:50 PM, Saad
Mufti <
> > saad.mufti@gmail.com>
> > > > > >>> wrote:
> > > > > >>>>>>>>
> > > > > >>>>>>>> Hi,
> > > > > >>>>>>>>
> > > > > >>>>>>>> We are using HBase 1.0 on CDH 5.5.2
. We have taken great
> > care
> > > > to
> > > > > >>> avoid
> > > > > >>>>>>>> hotspotting due to inadvertent data
patterns by prepending
> > an
> > > > MD5
> > > > > >>>>> based 4
> > > > > >>>>>>>> digit hash prefix to all our data
keys. This works fine
> most
> > > of
> > > > > the
> > > > > >>>>>>> times,
> > > > > >>>>>>>> but more and more (as much as once
or twice a day)
> recently
> > we
> > > > > have
> > > > > >>>>>>>> occasions where one region server
suddenly becomes "hot"
> > (CPU
> > > > > above
> > > > > >>> or
> > > > > >>>>>>>> around 95% in various monitoring
tools). When it happens
> it
> > > > lasts
> > > > > >>> for
> > > > > >>>>>>>> hours, occasionally the hotspot
might jump to another
> region
> > > > > server
> > > > > >>> as
> > > > > >>>>>>> the
> > > > > >>>>>>>> master decide the region is unresponsive
and gives its
> > region
> > > to
> > > > > >>>>> another
> > > > > >>>>>>>> server.
> > > > > >>>>>>>>
> > > > > >>>>>>>> For the longest time, we thought
this must be some single
> > > rogue
> > > > > key
> > > > > >>> in
> > > > > >>>>>>> our
> > > > > >>>>>>>> input data that is being hammered.
All attempts to track
> > this
> > > > down
> > > > > >>> have
> > > > > >>>>>>>> failed though, and the following
behavior argues against
> > this
> > > > > being
> > > > > >>>>>>>> application based:
> > > > > >>>>>>>>
> > > > > >>>>>>>> 1. plotted Get and Put rate by region
on the "hot" region
> > > server
> > > > > in
> > > > > >>>>>>>> Cloudera Manager Charts, shows no
single region is an
> > outlier.
> > > > > >>>>>>>>
> > > > > >>>>>>>> 2. cleanly restarting just the region
server process
> causes
> > > its
> > > > > >>> regions
> > > > > >>>>>>> to
> > > > > >>>>>>>> randomly migrate to other region
servers, then it gets new
> > > ones
> > > > > from
> > > > > >>>>> the
> > > > > >>>>>>>> HBase master, basically a sort of
shuffling, then the
> > hotspot
> > > > goes
> > > > > >>>>> away.
> > > > > >>>>>>> If
> > > > > >>>>>>>> it were application based, you'd
expect the hotspot to
> just
> > > jump
> > > > > to
> > > > > >>>>>>> another
> > > > > >>>>>>>> region server.
> > > > > >>>>>>>>
> > > > > >>>>>>>> 3. have pored through region server
logs and can't see
> > > anything
> > > > > out
> > > > > >>> of
> > > > > >>>>>>> the
> > > > > >>>>>>>> ordinary happening
> > > > > >>>>>>>>
> > > > > >>>>>>>> The only other pertinent thing to
mention might be that we
> > > have
> > > > a
> > > > > >>>>> special
> > > > > >>>>>>>> process of our own running outside
the cluster that does
> > > cluster
> > > > > >>> wide
> > > > > >>>>>>> major
> > > > > >>>>>>>> compaction in a rolling fashion,
where each batch consists
> > of
> > > > one
> > > > > >>>>> region
> > > > > >>>>>>>> from each region server, and it
waits before one batch is
> > > > > completely
> > > > > >>>>> done
> > > > > >>>>>>>> before starting another. We have
seen no real impact on
> the
> > > > > hotspot
> > > > > >>>>> from
> > > > > >>>>>>>> shutting this down and in normal
times it doesn't impact
> our
> > > > read
> > > > > or
> > > > > >>>>>>> write
> > > > > >>>>>>>> performance much.
> > > > > >>>>>>>>
> > > > > >>>>>>>> We are at our wit's end, anyone
have experience with a
> > > scenario
> > > > > like
> > > > > >>>>>>> this?
> > > > > >>>>>>>> Any help/guidance would be most
appreciated.
> > > > > >>>>>>>>
> > > > > >>>>>>>> -----
> > > > > >>>>>>>> Saad
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>
> > > > > >>
> > > > > >>
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message