Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 3FCF0200BD2 for ; Sat, 3 Dec 2016 16:40:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 3E8D0160B16; Sat, 3 Dec 2016 15:40:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 38658160B0F for ; Sat, 3 Dec 2016 16:39:59 +0100 (CET) Received: (qmail 31554 invoked by uid 500); 3 Dec 2016 15:39:57 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 31537 invoked by uid 99); 3 Dec 2016 15:39:57 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 03 Dec 2016 15:39:57 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id EB3831A054C for ; Sat, 3 Dec 2016 15:39:56 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.649 X-Spam-Level: ** X-Spam-Status: No, score=2.649 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id l3PJz0l8jNPV for ; Sat, 3 Dec 2016 15:39:51 +0000 (UTC) Received: from mail-io0-f182.google.com (mail-io0-f182.google.com [209.85.223.182]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 2106E5F3F0 for ; Sat, 3 Dec 2016 15:39:51 +0000 (UTC) Received: by mail-io0-f182.google.com with SMTP id m5so393049129ioe.3 for ; Sat, 03 Dec 2016 07:39:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=jMoltWUQxup2TEEHiEL8CRcD1hMD9wYD6rYjO25Uk6E=; b=b9x42VCxTUEVAS44jS1guYppRxN9CjghuVzjkKbol8LYvYclb7X805n5Y3TBLPJQTy cc2+eTqfuybptMbPBBGSJjkkYdmrVGz4XN9clmqj5AQGX+r8ORzjGjs4jctWdPZowlqD T9R22cIhaLK0oAZByN+5ml7IgnVDk2sq7Ij+P6Xb+oHDkXlfGzANc+KaApSsodQrWXHZ eKorjywYaeek5Z4of0WU6XkGFbR488GpD49cohNOoBpT5L5AUbXtDpgFTDcIyiQru5TJ otr8tcpkT+hVmwYijevjFRFoeTFsM5axzDw9QFXq/Yfe7n3mAQc9U+tHnLGbuPW0hQOJ TGhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=jMoltWUQxup2TEEHiEL8CRcD1hMD9wYD6rYjO25Uk6E=; b=VGRjioo/vO9Y0rFbdgB0YYUm49YTTZqeGKmgOZpLha2+nnlLhKVYCckzk9Jw4/GgHo Je1SuW0OPS2vKPh6wiWzdPUDgiepLaHTAvD3QkJbjQXbZwraJxXGu30IZxOLsiNkjHAx FL8e/YG0gP+mHtqctEbVcuEAS9ZVQNTJ1EK8waDzDLglL1fLbV1xUKq0ZzjZsD2PfBEv 2y15QGdNsjou098r7lSbAIx4Ahkpp3xSwfreKVIbZvoSxHTMOF61Uh1Vj1bfGoor87qC fXNZQRhxgSXYjy65u0N6VS7fOYf9sckQ80MebZNffTL95xQ93oGor74ir7mOthz0S4ae 6hxw== X-Gm-Message-State: AKaTC02M4rTJboygcuA818Ig2kO9nUuGD6FMxUMVdZWNImBiyePNVIOZ2FPNmtjKkd6rqSX9FSLHhNvlaQSzPQ== X-Received: by 10.36.90.72 with SMTP id v69mr2033907ita.74.1480779587338; Sat, 03 Dec 2016 07:39:47 -0800 (PST) MIME-Version: 1.0 References: <3EF6502A-3BED-4B91-844C-C7C9EC65A0DB@splicemachine.com> <1979496454.4779903.1480710455902@mail.yahoo.com> In-Reply-To: From: Jeremy Carroll Date: Sat, 03 Dec 2016 15:39:36 +0000 Message-ID: Subject: Re: Hot Region Server With No Hot Region To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=001a113a22ce4b498f0542c2dd2a archived-at: Sat, 03 Dec 2016 15:40:00 -0000 --001a113a22ce4b498f0542c2dd2a Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I would check compaction, investigate throttling if it's causing high CPU. On Sat, Dec 3, 2016 at 6:20 AM Saad Mufti wrote: > No. > > ---- > Saad > > > On Fri, Dec 2, 2016 at 3:27 PM, Ted Yu wrote: > > > Some how I couldn't access the pastebin (I am in China now). > > Did the region server showing hotspot host meta ? > > Thanks > > > > On Friday, December 2, 2016 11:53 AM, Saad Mufti < > saad.mufti@gmail.com> > > wrote: > > > > > > We're in AWS with D2.4xLarge instances. Each instance has 12 independe= nt > > spindles/disks from what I can tell. > > > > We have charted get_rate and mutate_rate by host and > > > > a) mutate_rate shows no real outliers > > b) read_rate shows the overall rate on the "hotspot" region server is a > bit > > higher than every other server, not severely but enough that it is a bi= t > > noticeable. But when we chart get_rate on that server by region, no one > > region stands out. > > > > get_rate chart by host: > > > > https://snag.gy/hmoiDw.jpg > > > > mutate_rate chart by host: > > > > https://snag.gy/jitdMN.jpg > > > > ---- > > Saad > > > > > > ---- > > Saad > > > > > > On Fri, Dec 2, 2016 at 2:34 PM, John Leach > > wrote: > > > > > Here is what I see... > > > > > > > > > * Short Compaction Running on Heap > > > "regionserver/ip-10-99-181-146.aolp-prd.us-east-1.ec2. > > > aolcloud.net/10.99.181.146:60020-shortCompactions-1480229281547" - > > Thread > > > t@242 > > > java.lang.Thread.State: RUNNABLE > > > at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder. > > > compressSingleKeyValue(FastDiffDeltaEncoder.java:270) > > > at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder. > > > internalEncode(FastDiffDeltaEncoder.java:245) > > > at org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder. > > > encode(BufferedDataBlockEncoder.java:987) > > > at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder. > > > encode(FastDiffDeltaEncoder.java:58) > > > at org.apache.hadoop.hbase.io > .hfile.HFileDataBlockEncoderImpl.encode( > > > HFileDataBlockEncoderImpl.java:97) > > > at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.write( > > > HFileBlock.java:866) > > > at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append( > > > HFileWriterV2.java:270) > > > at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append( > > > HFileWriterV3.java:87) > > > at org.apache.hadoop.hbase.regionserver.StoreFile$Writer. > > > append(StoreFile.java:949) > > > at org.apache.hadoop.hbase.regionserver.compactions. > > > Compactor.performCompaction(Compactor.java:282) > > > at org.apache.hadoop.hbase.regionserver.compactions. > > > DefaultCompactor.compact(DefaultCompactor.java:105) > > > at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$ > > > DefaultCompactionContext.compact(DefaultStoreEngine.java:124) > > > at org.apache.hadoop.hbase.regionserver.HStore.compact( > > > HStore.java:1233) > > > at org.apache.hadoop.hbase.regionserver.HRegion.compact( > > > HRegion.java:1770) > > > at org.apache.hadoop.hbase.regionserver.CompactSplitThread$ > > > CompactionRunner.run(CompactSplitThread.java:520) > > > at java.util.concurrent.ThreadPoolExecutor.runWorker( > > > ThreadPoolExecutor.java:1142) > > > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > > > ThreadPoolExecutor.java:617) > > > at java.lang.Thread.run(Thread.java:745) > > > > > > > > > * WAL Syncs waiting=E2=80=A6 ALL 5 > > > "sync.0" - Thread t@202 > > > java.lang.Thread.State: TIMED_WAITING > > > at java.lang.Object.wait(Native Method) > > > - waiting on <67ba892d> (a java.util.LinkedList) > > > at org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno( > > > DFSOutputStream.java:2337) > > > at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync( > > > DFSOutputStream.java:2224) > > > at org.apache.hadoop.hdfs.DFSOutputStream.hflush( > > > DFSOutputStream.java:2116) > > > at org.apache.hadoop.fs.FSDataOutputStream.hflush( > > > FSDataOutputStream.java:130) > > > at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync= ( > > > ProtobufLogWriter.java:173) > > > at org.apache.hadoop.hbase.regionserver.wal.FSHLog$ > > > SyncRunner.run(FSHLog.java:1379) > > > at java.lang.Thread.run(Thread.java:745) > > > > > > * Mutations backing up very badly... > > > > > > "B.defaultRpcServer.handler=3D103,queue=3D7,port=3D60020" - Thread t@= 155 > > > java.lang.Thread.State: TIMED_WAITING > > > at java.lang.Object.wait(Native Method) > > > - waiting on <6ab54ea3> (a org.apache.hadoop.hbase. > > > regionserver.wal.SyncFuture) > > > at org.apache.hadoop.hbase.regionserver.wal.SyncFuture. > > > get(SyncFuture.java:167) > > > at org.apache.hadoop.hbase.regionserver.wal.FSHLog. > > > blockOnSync(FSHLog.java:1504) > > > at org.apache.hadoop.hbase.regionserver.wal.FSHLog. > > > publishSyncThenBlockOnCompletion(FSHLog.java:1498) > > > at org.apache.hadoop.hbase.regionserver.wal.FSHLog.sync( > > > FSHLog.java:1632) > > > at org.apache.hadoop.hbase.regionserver.HRegion. > > > syncOrDefer(HRegion.java:7737) > > > at org.apache.hadoop.hbase.regionserver.HRegion. > > > processRowsWithLocks(HRegion.java:6504) > > > at org.apache.hadoop.hbase.regionserver.HRegion. > > > mutateRowsWithLocks(HRegion.java:6352) > > > at org.apache.hadoop.hbase.regionserver.HRegion. > > > mutateRowsWithLocks(HRegion.java:6334) > > > at org.apache.hadoop.hbase.regionserver.HRegion. > > > mutateRow(HRegion.java:6325) > > > at org.apache.hadoop.hbase.regionserver.RSRpcServices. > > > mutateRows(RSRpcServices.java:418) > > > at org.apache.hadoop.hbase.regionserver.RSRpcServices. > > > multi(RSRpcServices.java:1916) > > > at org.apache.hadoop.hbase.protobuf.generated. > > > > ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213) > > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2034) > > > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107) > > > at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop( > > > RpcExecutor.java:130) > > > at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor. > > java:107) > > > at java.lang.Thread.run(Thread.java:745) > > > > > > > > > Too many writers being blocked attempting to write to WAL. > > > > > > What does your disk infrastructure look like? Can you get away with > > > Multi-wal? Ugh... > > > > > > Regards, > > > John Leach > > > > > > > > > > On Dec 2, 2016, at 1:20 PM, Saad Mufti wrote= : > > > > > > > > Hi Ted, > > > > > > > > Finally we have another hotspot going on, same symptoms as before, > here > > > is > > > > the pastebin for the stack trace from the region server that I > obtained > > > via > > > > VisualVM: > > > > > > > > http://pastebin.com/qbXPPrXk > > > > > > > > Would really appreciate any insight you or anyone else can provide. > > > > > > > > Thanks. > > > > > > > > ---- > > > > Saad > > > > > > > > > > > > On Thu, Dec 1, 2016 at 6:08 PM, Saad Mufti > > wrote: > > > > > > > >> Sure will, the next time it happens. > > > >> > > > >> Thanks!!! > > > >> > > > >> ---- > > > >> Saad > > > >> > > > >> > > > >> On Thu, Dec 1, 2016 at 5:01 PM, Ted Yu > > > wrote: > > > >> > > > >>> From #2 in the initial email, the hbase:meta might not be the cau= se > > for > > > >>> the hotspot. > > > >>> > > > >>> Saad: > > > >>> Can you pastebin stack trace of the hot region server when this > > happens > > > >>> again ? > > > >>> > > > >>> Thanks > > > >>> > > > >>>> On Dec 2, 2016, at 4:48 AM, Saad Mufti > > wrote: > > > >>>> > > > >>>> We used a pre-split into 1024 regions at the start but we > > > miscalculated > > > >>> our > > > >>>> data size, so there were still auto-splits storms at the beginni= ng > > as > > > >>> data > > > >>>> size stabilized, it has ended up at around 9500 or so regions, > plus > > a > > > >>> few > > > >>>> thousand regions for a few other tables (much smaller). But > haven't > > > had > > > >>> any > > > >>>> new auto-splits in a couple of months. And the hotspots only > started > > > >>>> happening recently. > > > >>>> > > > >>>> Our hashing scheme is very simple, we take the MD5 of the key, > then > > > >>> form a > > > >>>> 4 digit prefix based on the first two bytes of the MD5 normalize= d > to > > > be > > > >>>> within the range 0-1023 . I am fairly confident about this schem= e > > > >>>> especially since even during the hotspot we see no evidence so f= ar > > > that > > > >>> any > > > >>>> particular region is taking disproportionate traffic (based on > > > Cloudera > > > >>>> Manager per region charts on the hotspot server). Does that look > > like > > > a > > > >>>> reasonable scheme to randomize which region any give key goes to= ? > > And > > > >>> the > > > >>>> start of the hotspot doesn't seem to correspond to any region > > > splitting > > > >>> or > > > >>>> moving from one server to another activity. > > > >>>> > > > >>>> Thanks. > > > >>>> > > > >>>> ---- > > > >>>> Saad > > > >>>> > > > >>>> > > > >>>>> On Thu, Dec 1, 2016 at 3:32 PM, John Leach < > > jleach@splicemachine.com > > > > > > > >>> wrote: > > > >>>>> > > > >>>>> Saad, > > > >>>>> > > > >>>>> Region move or split causes client connections to simultaneousl= y > > > >>> refresh > > > >>>>> their meta. > > > >>>>> > > > >>>>> Key word is supposed. We have seen meta hot spotting from time > to > > > time > > > >>>>> and on different versions at Splice Machine. > > > >>>>> > > > >>>>> How confident are you in your hashing algorithm? > > > >>>>> > > > >>>>> Regards, > > > >>>>> John Leach > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>>> On Dec 1, 2016, at 2:25 PM, Saad Mufti > > > wrote: > > > >>>>>> > > > >>>>>> No never thought about that. I just figured out how to locate > the > > > >>> server > > > >>>>>> for that table after you mentioned it. We'll have to keep an e= ye > > on > > > it > > > >>>>> next > > > >>>>>> time we have a hotspot to see if it coincides with the hotspot > > > server. > > > >>>>>> > > > >>>>>> What would be the theory for how it could become a hotspot? > Isn't > > > the > > > >>>>>> client supposed to cache it and only go back for a refresh if = it > > > hits > > > >>> a > > > >>>>>> region that is not in its expected location? > > > >>>>>> > > > >>>>>> ---- > > > >>>>>> Saad > > > >>>>>> > > > >>>>>> > > > >>>>>> On Thu, Dec 1, 2016 at 2:56 PM, John Leach < > > > jleach@splicemachine.com> > > > >>>>> wrote: > > > >>>>>> > > > >>>>>>> Saad, > > > >>>>>>> > > > >>>>>>> Did you validate that Meta is not on the =E2=80=9CHot=E2=80= =9D region server? > > > >>>>>>> > > > >>>>>>> Regards, > > > >>>>>>> John Leach > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>>> On Dec 1, 2016, at 1:50 PM, Saad Mufti > > > >>> wrote: > > > >>>>>>>> > > > >>>>>>>> Hi, > > > >>>>>>>> > > > >>>>>>>> We are using HBase 1.0 on CDH 5.5.2 . We have taken great ca= re > > to > > > >>> avoid > > > >>>>>>>> hotspotting due to inadvertent data patterns by prepending a= n > > MD5 > > > >>>>> based 4 > > > >>>>>>>> digit hash prefix to all our data keys. This works fine most > of > > > the > > > >>>>>>> times, > > > >>>>>>>> but more and more (as much as once or twice a day) recently = we > > > have > > > >>>>>>>> occasions where one region server suddenly becomes "hot" (CP= U > > > above > > > >>> or > > > >>>>>>>> around 95% in various monitoring tools). When it happens it > > lasts > > > >>> for > > > >>>>>>>> hours, occasionally the hotspot might jump to another region > > > server > > > >>> as > > > >>>>>>> the > > > >>>>>>>> master decide the region is unresponsive and gives its regio= n > to > > > >>>>> another > > > >>>>>>>> server. > > > >>>>>>>> > > > >>>>>>>> For the longest time, we thought this must be some single > rogue > > > key > > > >>> in > > > >>>>>>> our > > > >>>>>>>> input data that is being hammered. All attempts to track thi= s > > down > > > >>> have > > > >>>>>>>> failed though, and the following behavior argues against thi= s > > > being > > > >>>>>>>> application based: > > > >>>>>>>> > > > >>>>>>>> 1. plotted Get and Put rate by region on the "hot" region > server > > > in > > > >>>>>>>> Cloudera Manager Charts, shows no single region is an outlie= r. > > > >>>>>>>> > > > >>>>>>>> 2. cleanly restarting just the region server process causes > its > > > >>> regions > > > >>>>>>> to > > > >>>>>>>> randomly migrate to other region servers, then it gets new > ones > > > from > > > >>>>> the > > > >>>>>>>> HBase master, basically a sort of shuffling, then the hotspo= t > > goes > > > >>>>> away. > > > >>>>>>> If > > > >>>>>>>> it were application based, you'd expect the hotspot to just > jump > > > to > > > >>>>>>> another > > > >>>>>>>> region server. > > > >>>>>>>> > > > >>>>>>>> 3. have pored through region server logs and can't see > anything > > > out > > > >>> of > > > >>>>>>> the > > > >>>>>>>> ordinary happening > > > >>>>>>>> > > > >>>>>>>> The only other pertinent thing to mention might be that we > have > > a > > > >>>>> special > > > >>>>>>>> process of our own running outside the cluster that does > cluster > > > >>> wide > > > >>>>>>> major > > > >>>>>>>> compaction in a rolling fashion, where each batch consists o= f > > one > > > >>>>> region > > > >>>>>>>> from each region server, and it waits before one batch is > > > completely > > > >>>>> done > > > >>>>>>>> before starting another. We have seen no real impact on the > > > hotspot > > > >>>>> from > > > >>>>>>>> shutting this down and in normal times it doesn't impact our > > read > > > or > > > >>>>>>> write > > > >>>>>>>> performance much. > > > >>>>>>>> > > > >>>>>>>> We are at our wit's end, anyone have experience with a > scenario > > > like > > > >>>>>>> this? > > > >>>>>>>> Any help/guidance would be most appreciated. > > > >>>>>>>> > > > >>>>>>>> ----- > > > >>>>>>>> Saad > > > >>>>>>> > > > >>>>>>> > > > >>>>> > > > >>>>> > > > >>> > > > >> > > > >> > > > > > > > > > > > > > --001a113a22ce4b498f0542c2dd2a--