Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id BC0E3200BCE for ; Fri, 2 Dec 2016 20:34:56 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id BAA0E160B24; Fri, 2 Dec 2016 19:34:56 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B465B160B08 for ; Fri, 2 Dec 2016 20:34:55 +0100 (CET) Received: (qmail 65297 invoked by uid 500); 2 Dec 2016 19:34:54 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 65280 invoked by uid 99); 2 Dec 2016 19:34:53 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Dec 2016 19:34:53 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 740C81AAB18 for ; Fri, 2 Dec 2016 19:34:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.449 X-Spam-Level: ** X-Spam-Status: No, score=2.449 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, KAM_INFOUSMEBIZ=0.75, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_SORBS_SPAM=0.5] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=splicemachine.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id gn4DJdc-lxoD for ; Fri, 2 Dec 2016 19:34:49 +0000 (UTC) Received: from mail-io0-f174.google.com (mail-io0-f174.google.com [209.85.223.174]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id D0D705F27D for ; Fri, 2 Dec 2016 19:34:48 +0000 (UTC) Received: by mail-io0-f174.google.com with SMTP id j65so499563061iof.0 for ; Fri, 02 Dec 2016 11:34:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=splicemachine.com; s=google; h=from:mime-version:subject:date:references:to:in-reply-to:message-id; bh=dkA6t+cl+DFO+kr0cJaOZOAG9+iIB6Fga97JLe62C0o=; b=KHPL3/oSFvDstwLX37IWvSoUOm/gKZDhVMUVRpLVSTSxolYAJ/fWm2kZ6AwzB8XRfF GxZgEBkWegFZWKyK0p5rWO8D+o/2GsIk4UpGXLtWxDszJUZa+m1Vhvp4d4kVSgoTZz1e /0nQjxetyzymk38+9BqbS1kP3a6zsKQmnyl/U= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:mime-version:subject:date:references:to :in-reply-to:message-id; bh=dkA6t+cl+DFO+kr0cJaOZOAG9+iIB6Fga97JLe62C0o=; b=XqRJ0LHUuun4ZyDnMZGt0tF1ZmMMtKwSU3zBhYYZDy+U6ZAB8NncSU4dDot3rCFhlI yH4gafxwvIqkSrfbYgOvv0WiWoJ9ctv2VOPW4RY0JjMCHBsxf9VmNg0Ond0WWsKSB0hY xNtYuBKM127LVMF/W75RnSy02QJxn7afDrtRgPNRpICjpmvunLhyIIzZhxd3nNGbOohZ aKAWbrXuHZew9z63ETo48J1kPRZs//P0hAeLSVpPmefcuPEXAFCj0Vy4x4XxwFP/tucP Ft8dpe90nNQUQIIX/EoOgR1mGHuTc6qe127aTMsS4//kvS8bAhFjThnaR7vl+0nyx6MQ delg== X-Gm-Message-State: AKaTC00JA4sYWfvaQ91XHow4fM4ncK968BW1V2vyJvCcz0PUCUFF8ceBMG4tP5c1Im70ig== X-Received: by 10.107.8.201 with SMTP id h70mr40454184ioi.139.1480707287936; Fri, 02 Dec 2016 11:34:47 -0800 (PST) Received: from [10.0.1.100] (68-188-101-113.static.stls.mo.charter.com. [68.188.101.113]) by smtp.gmail.com with ESMTPSA id l74sm1568485ita.15.2016.12.02.11.34.47 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 02 Dec 2016 11:34:47 -0800 (PST) From: John Leach Content-Type: multipart/alternative; boundary="Apple-Mail=_9F0E84E3-0E71-47AB-8E57-49579091DAB0" Mime-Version: 1.0 (Mac OS X Mail 10.1 \(3251\)) Subject: Re: Hot Region Server With No Hot Region Date: Fri, 2 Dec 2016 13:34:45 -0600 References: To: user@hbase.apache.org In-Reply-To: Message-Id: <3EF6502A-3BED-4B91-844C-C7C9EC65A0DB@splicemachine.com> X-Mailer: Apple Mail (2.3251) archived-at: Fri, 02 Dec 2016 19:34:56 -0000 --Apple-Mail=_9F0E84E3-0E71-47AB-8E57-49579091DAB0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Here is what I see... * Short Compaction Running on Heap = "regionserver/ip-10-99-181-146.aolp-prd.us-east-1.ec2.aolcloud.net/10.99.1= 81.146:60020-shortCompactions-1480229281547" - Thread t@242 java.lang.Thread.State: RUNNABLE at = org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.compressSingleKey= Value(FastDiffDeltaEncoder.java:270) at = org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.internalEncode(Fa= stDiffDeltaEncoder.java:245) at = org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder.encode(Buffer= edDataBlockEncoder.java:987) at = org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.encode(FastDiffDe= ltaEncoder.java:58) at = org.apache.hadoop.hbase.io.hfile.HFileDataBlockEncoderImpl.encode(HFileDat= aBlockEncoderImpl.java:97) at = org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.write(HFileBlock.java:8= 66) at = org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:2= 70) at = org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:8= 7) at = org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.jav= a:949) at = org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompacti= on(Compactor.java:282) at = org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(= DefaultCompactor.java:105) at = org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionC= ontext.compact(DefaultStoreEngine.java:124) at = org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1233) at = org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1770) at = org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.r= un(CompactSplitThread.java:520) at = java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:= 1142) at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java= :617) at java.lang.Thread.run(Thread.java:745) * WAL Syncs waiting=E2=80=A6 ALL 5 "sync.0" - Thread t@202 java.lang.Thread.State: TIMED_WAITING at java.lang.Object.wait(Native Method) - waiting on <67ba892d> (a java.util.LinkedList) at = org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(DFSOutputStream.j= ava:2337) at = org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:22= 24) at = org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:2116) at = org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130= ) at = org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLo= gWriter.java:173) at = org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java= :1379) at java.lang.Thread.run(Thread.java:745) * Mutations backing up very badly... "B.defaultRpcServer.handler=3D103,queue=3D7,port=3D60020" - Thread t@155 java.lang.Thread.State: TIMED_WAITING at java.lang.Object.wait(Native Method) - waiting on <6ab54ea3> (a = org.apache.hadoop.hbase.regionserver.wal.SyncFuture) at = org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:16= 7) at = org.apache.hadoop.hbase.regionserver.wal.FSHLog.blockOnSync(FSHLog.java:15= 04) at = org.apache.hadoop.hbase.regionserver.wal.FSHLog.publishSyncThenBlockOnComp= letion(FSHLog.java:1498) at = org.apache.hadoop.hbase.regionserver.wal.FSHLog.sync(FSHLog.java:1632) at = org.apache.hadoop.hbase.regionserver.HRegion.syncOrDefer(HRegion.java:7737= ) at = org.apache.hadoop.hbase.regionserver.HRegion.processRowsWithLocks(HRegion.= java:6504) at = org.apache.hadoop.hbase.regionserver.HRegion.mutateRowsWithLocks(HRegion.j= ava:6352) at = org.apache.hadoop.hbase.regionserver.HRegion.mutateRowsWithLocks(HRegion.j= ava:6334) at = org.apache.hadoop.hbase.regionserver.HRegion.mutateRow(HRegion.java:6325) at = org.apache.hadoop.hbase.regionserver.RSRpcServices.mutateRows(RSRpcService= s.java:418) at = org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.jav= a:1916) at = org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.ca= llBlockingMethod(ClientProtos.java:32213) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2034) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107) at = org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)= at = org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) at java.lang.Thread.run(Thread.java:745) Too many writers being blocked attempting to write to WAL. What does your disk infrastructure look like? Can you get away with = Multi-wal? Ugh... Regards, John Leach > On Dec 2, 2016, at 1:20 PM, Saad Mufti wrote: >=20 > Hi Ted, >=20 > Finally we have another hotspot going on, same symptoms as before, = here is > the pastebin for the stack trace from the region server that I = obtained via > VisualVM: >=20 > http://pastebin.com/qbXPPrXk >=20 > Would really appreciate any insight you or anyone else can provide. >=20 > Thanks. >=20 > ---- > Saad >=20 >=20 > On Thu, Dec 1, 2016 at 6:08 PM, Saad Mufti = wrote: >=20 >> Sure will, the next time it happens. >>=20 >> Thanks!!! >>=20 >> ---- >> Saad >>=20 >>=20 >> On Thu, Dec 1, 2016 at 5:01 PM, Ted Yu = wrote: >>=20 >>> =46rom #2 in the initial email, the hbase:meta might not be the = cause for >>> the hotspot. >>>=20 >>> Saad: >>> Can you pastebin stack trace of the hot region server when this = happens >>> again ? >>>=20 >>> Thanks >>>=20 >>>> On Dec 2, 2016, at 4:48 AM, Saad Mufti = wrote: >>>>=20 >>>> We used a pre-split into 1024 regions at the start but we = miscalculated >>> our >>>> data size, so there were still auto-splits storms at the beginning = as >>> data >>>> size stabilized, it has ended up at around 9500 or so regions, plus = a >>> few >>>> thousand regions for a few other tables (much smaller). But haven't = had >>> any >>>> new auto-splits in a couple of months. And the hotspots only = started >>>> happening recently. >>>>=20 >>>> Our hashing scheme is very simple, we take the MD5 of the key, then >>> form a >>>> 4 digit prefix based on the first two bytes of the MD5 normalized = to be >>>> within the range 0-1023 . I am fairly confident about this scheme >>>> especially since even during the hotspot we see no evidence so far = that >>> any >>>> particular region is taking disproportionate traffic (based on = Cloudera >>>> Manager per region charts on the hotspot server). Does that look = like a >>>> reasonable scheme to randomize which region any give key goes to? = And >>> the >>>> start of the hotspot doesn't seem to correspond to any region = splitting >>> or >>>> moving from one server to another activity. >>>>=20 >>>> Thanks. >>>>=20 >>>> ---- >>>> Saad >>>>=20 >>>>=20 >>>>> On Thu, Dec 1, 2016 at 3:32 PM, John Leach = >>> wrote: >>>>>=20 >>>>> Saad, >>>>>=20 >>>>> Region move or split causes client connections to simultaneously >>> refresh >>>>> their meta. >>>>>=20 >>>>> Key word is supposed. We have seen meta hot spotting from time to = time >>>>> and on different versions at Splice Machine. >>>>>=20 >>>>> How confident are you in your hashing algorithm? >>>>>=20 >>>>> Regards, >>>>> John Leach >>>>>=20 >>>>>=20 >>>>>=20 >>>>>> On Dec 1, 2016, at 2:25 PM, Saad Mufti = wrote: >>>>>>=20 >>>>>> No never thought about that. I just figured out how to locate the >>> server >>>>>> for that table after you mentioned it. We'll have to keep an eye = on it >>>>> next >>>>>> time we have a hotspot to see if it coincides with the hotspot = server. >>>>>>=20 >>>>>> What would be the theory for how it could become a hotspot? Isn't = the >>>>>> client supposed to cache it and only go back for a refresh if it = hits >>> a >>>>>> region that is not in its expected location? >>>>>>=20 >>>>>> ---- >>>>>> Saad >>>>>>=20 >>>>>>=20 >>>>>> On Thu, Dec 1, 2016 at 2:56 PM, John Leach = >>>>> wrote: >>>>>>=20 >>>>>>> Saad, >>>>>>>=20 >>>>>>> Did you validate that Meta is not on the =E2=80=9CHot=E2=80=9D = region server? >>>>>>>=20 >>>>>>> Regards, >>>>>>> John Leach >>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>>>> On Dec 1, 2016, at 1:50 PM, Saad Mufti >>> wrote: >>>>>>>>=20 >>>>>>>> Hi, >>>>>>>>=20 >>>>>>>> We are using HBase 1.0 on CDH 5.5.2 . We have taken great care = to >>> avoid >>>>>>>> hotspotting due to inadvertent data patterns by prepending an = MD5 >>>>> based 4 >>>>>>>> digit hash prefix to all our data keys. This works fine most of = the >>>>>>> times, >>>>>>>> but more and more (as much as once or twice a day) recently we = have >>>>>>>> occasions where one region server suddenly becomes "hot" (CPU = above >>> or >>>>>>>> around 95% in various monitoring tools). When it happens it = lasts >>> for >>>>>>>> hours, occasionally the hotspot might jump to another region = server >>> as >>>>>>> the >>>>>>>> master decide the region is unresponsive and gives its region = to >>>>> another >>>>>>>> server. >>>>>>>>=20 >>>>>>>> For the longest time, we thought this must be some single rogue = key >>> in >>>>>>> our >>>>>>>> input data that is being hammered. All attempts to track this = down >>> have >>>>>>>> failed though, and the following behavior argues against this = being >>>>>>>> application based: >>>>>>>>=20 >>>>>>>> 1. plotted Get and Put rate by region on the "hot" region = server in >>>>>>>> Cloudera Manager Charts, shows no single region is an outlier. >>>>>>>>=20 >>>>>>>> 2. cleanly restarting just the region server process causes its >>> regions >>>>>>> to >>>>>>>> randomly migrate to other region servers, then it gets new ones = from >>>>> the >>>>>>>> HBase master, basically a sort of shuffling, then the hotspot = goes >>>>> away. >>>>>>> If >>>>>>>> it were application based, you'd expect the hotspot to just = jump to >>>>>>> another >>>>>>>> region server. >>>>>>>>=20 >>>>>>>> 3. have pored through region server logs and can't see anything = out >>> of >>>>>>> the >>>>>>>> ordinary happening >>>>>>>>=20 >>>>>>>> The only other pertinent thing to mention might be that we have = a >>>>> special >>>>>>>> process of our own running outside the cluster that does = cluster >>> wide >>>>>>> major >>>>>>>> compaction in a rolling fashion, where each batch consists of = one >>>>> region >>>>>>>> from each region server, and it waits before one batch is = completely >>>>> done >>>>>>>> before starting another. We have seen no real impact on the = hotspot >>>>> from >>>>>>>> shutting this down and in normal times it doesn't impact our = read or >>>>>>> write >>>>>>>> performance much. >>>>>>>>=20 >>>>>>>> We are at our wit's end, anyone have experience with a scenario = like >>>>>>> this? >>>>>>>> Any help/guidance would be most appreciated. >>>>>>>>=20 >>>>>>>> ----- >>>>>>>> Saad >>>>>>>=20 >>>>>>>=20 >>>>>=20 >>>>>=20 >>>=20 >>=20 >>=20 --Apple-Mail=_9F0E84E3-0E71-47AB-8E57-49579091DAB0--