Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7856F1828B for ; Tue, 1 Dec 2015 01:00:10 +0000 (UTC) Received: (qmail 22924 invoked by uid 500); 1 Dec 2015 01:00:09 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 22716 invoked by uid 500); 1 Dec 2015 01:00:09 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 22698 invoked by uid 99); 1 Dec 2015 01:00:09 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Dec 2015 01:00:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 3B66C1A0C93 for ; Tue, 1 Dec 2015 01:00:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.252 X-Spam-Level: *** X-Spam-Status: No, score=3.252 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=3, KAM_BODY_MARKETINGBL_PCCC=0.001, KAM_LOTSOFHASH=0.25, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=hubspot-com.20150623.gappssmtp.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id B-mHR4WpTSM1 for ; Tue, 1 Dec 2015 00:59:54 +0000 (UTC) Received: from mail-oi0-f46.google.com (mail-oi0-f46.google.com [209.85.218.46]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 75DAB42ABE for ; Tue, 1 Dec 2015 00:59:54 +0000 (UTC) Received: by oixx65 with SMTP id x65so106886498oix.0 for ; Mon, 30 Nov 2015 16:59:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hubspot-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-type; bh=+/vCLvyGZmoGwcvs/5l5eujpNs5pNAzHvCtiuPW77aw=; b=XFs5M55AVKMjbSfttRFsLnm7tLb24TcGa8JXi49ytbCBZqAyI6iVBrIbW0ii4pdvVp Df28keDLgRLb9dS+TyclyUlzkdMPFiMFJGzpT5pk1boHX6UMEDfOsjs2NShoYIBYCa7l Zier9X/NXMOZBbHgDUiczT+5loUf8FssJuQkVlsMuZ2lrGh9WgubkvbN5CXMQdPinY8A gUc4dLorKwzx8RUCrf4GVIckCIzttaix1O0Hgsxo5UeUFptuSXP20nyx6cw0BZtht3/I 0dg6oHHdsUF4tSXQjZDqDy3c843j6WtTDsxPWc8gcj9+/YEMGmVS+EoySPQSCbWea1ZY XpdA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-type; bh=+/vCLvyGZmoGwcvs/5l5eujpNs5pNAzHvCtiuPW77aw=; b=SJpbL0mkGSSkmzI8s1U+9whhiU2iHDsZBVeo6fvYr2EoltBLYUrvzaruz6XF+9dmhV dH5xqfg0a12VhNGddtj7b6CoMY62jplxmp/AkEoxVXibbJN17NpI7W6XeyUCk3WftHdV HLYJ+v0U55HDWECHncnkz9u0Dh4Wn9Ls8HI5S7+vj5tHSaDV4Ri+/ynan9s5ts4B/2r1 jJxFKG2ZIkL+bQ5LBHtL/qUJfH8o132XF5veHcupYAz26rJTWsqvzJj6g95lOZ4oEjaj 6hydXPViME+hCINNXlyEl7Ukg5QUsIbKXbjLtNY7xs+qhKl2/XKrBchCEM3tm/W83THc 74UA== X-Gm-Message-State: ALoCoQlpbkOLqnsbJg1B4PP0OVNozyGfcl6ZyVotRgn9Sn7NZFBy+byE2b/yxJ6Gd+SbbMDY3m7m X-Received: by 10.202.182.133 with SMTP id g127mr42245461oif.41.1448931593954; Mon, 30 Nov 2015 16:59:53 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Bryan Beaudreault Date: Tue, 01 Dec 2015 00:59:44 +0000 Message-ID: Subject: Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0) To: HBase Dev List Cc: Hbase-User Content-Type: multipart/alternative; boundary=001a113ce122f6424d0525cbaca7 --001a113ce122f6424d0525cbaca7 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable We've been doing more debugging of this and have set up the read vs write handlers to try to at least segment this away so reads can work. We have pretty beefy servers, and are running wiht the following settings: hbase.regionserver.handler.count=3D1000 hbase.ipc.server.read.threadpool.size=3D50 hbase.ipc.server.callqueue.handler.factor=3D0.025 hbase.ipc.server.callqueue.read.ratio=3D0.6 hbase.ipc.server.callqueue.scan.ratio=3D0.5 We are seeing all 400 write handlers taken up by row locks for the most part. The read handlers are mostly idle. We're thinking of changing the ratio here, but are not sure it will help if they are all blocked on a row lock. We just enabled DEBUG logging on all our servers and notice the following: 2015-12-01 00:56:09,015 DEBUG org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected by nonce: [-687451119961178644:7664336281906118656], [state 0, hasWait false, activity 00:54:36.240] 2015-12-01 00:56:09,015 DEBUG org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected by nonce: [-687451119961178644:-7119840249342174227], [state 0, hasWait false, activity 00:54:36.256] 2015-12-01 00:56:09,268 DEBUG org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected by nonce: [-5946137511131403479:2112661701888365489], [state 0, hasWait false, activity 00:55:01.259] 2015-12-01 00:56:09,279 DEBUG org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected by nonce: [4165332617675853029:6256955295384472057], [state 0, hasWait false, activity 00:53:58.151] 2015-12-01 00:56:09,279 DEBUG org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected by nonce: [4165332617675853029:4961178013070912522], [state 0, hasWait false, activity 00:53:58.162] On Mon, Nov 30, 2015 at 6:11 PM Bryan Beaudreault wrote: > Sorry the second link should be > https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-= txt-L579 > > On Mon, Nov 30, 2015 at 6:10 PM Bryan Beaudreault < > bbeaudreault@hubspot.com> wrote: > >> https://gist.github.com/bbeaudreault/2994a748da83d9f75085 >> >> An active handler: >> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1= -txt-L286 >> One that is locked: >> https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfil= e1-txt-L579 >> >> The difference between pre-rollback and post is that previously we were >> seeing things blocked in mvcc. Now we are seeing them blocked on the >> upsert. >> >> It always follows the same pattern, of 1 active handler in the upsert an= d >> the rest blocked waiting for it. >> >> On Mon, Nov 30, 2015 at 6:05 PM Stack wrote: >> >>> On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault < >>> bbeaudreault@hubspot.com >>> > wrote: >>> >>> > The rollback seems to have mostly solved the issue for one of our >>> clusters, >>> > but another one is still seeing long increment times: >>> > >>> > "slowIncrementCount": 52080, >>> > "Increment_num_ops": 325236,"Increment_min": 1,"Increment_max": 6162,= " >>> > Increment_mean": 465.68678129112396,"Increment_median": 216," >>> > Increment_75th_percentile": 450.25,"Increment_95th_percentile": >>> > 1052.6499999999999,"Increment_99th_percentile": 1635.2399999999998 >>> > >>> > >>> > Any ideas if there are other changes that may be causing a performanc= e >>> > regression for increments between CDH4.7.1 and CDH5.3.8? >>> > >>> > >>> > >>> No. >>> >>> Post a thread dump Bryan and it might prompt something. >>> >>> St.Ack >>> >>> >>> >>> >>> > >>> > On Mon, Nov 30, 2015 at 4:13 PM Stack wrote: >>> > >>> > > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault < >>> > > bbeaudreault@hubspot.com> wrote: >>> > > >>> > > > Should this be added as a known issue in the CDH or hbase >>> > documentation? >>> > > It >>> > > > was a severe performance hit for us, all of our regionservers wer= e >>> > > sitting >>> > > > at a few thousand queued requests. >>> > > > >>> > > > >>> > > Let me take care of that. >>> > > St.Ack >>> > > >>> > > >>> > > >>> > > > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault < >>> > > > bbeaudreault@hubspot.com> >>> > > > wrote: >>> > > > >>> > > > > Yea, they are all over the place and called from client and >>> > coprocessor >>> > > > > code. We ended up having no other option but to rollback, and >>> aside >>> > > from >>> > > > a >>> > > > > few NoSuchMethodErrors due to API changes (Put#add vs >>> Put#addColumn), >>> > > it >>> > > > > seems to be working and fixing our problem. >>> > > > > >>> > > > > On Mon, Nov 30, 2015 at 3:47 PM Stack wrote: >>> > > > > >>> > > > >> Rollback is untested. No fix in 5.5. I was going to work on th= is >>> > now. >>> > > > >> Where >>> > > > >> are your counters Bryan? In their own column family or scatter= ed >>> > about >>> > > > in >>> > > > >> a >>> > > > >> row with other Cell types? >>> > > > >> St.Ack >>> > > > >> >>> > > > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault < >>> > > > >> bbeaudreault@hubspot.com> wrote: >>> > > > >> >>> > > > >> > Is there any update to this? We just upgraded all of our >>> > production >>> > > > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA >>> listed in >>> > > the >>> > > > >> > known issues, did not not about this. Now we are seeing >>> > perfomance >>> > > > >> issues >>> > > > >> > across all clusters, as we make heavy use of increments. >>> > > > >> > >>> > > > >> > Can we roll forward to CDH5.5 to fix? Or is our only hope to >>> roll >>> > > back >>> > > > >> to >>> > > > >> > CDH 5.3.1 (if that is possible)? >>> > > > >> > >>> > > > >> > >>> > > > >> > On Thu, Sep 24, 2015 at 5:06 AM =E9=88=B4=E6=9C=A8=E4=BF=8A= =E8=A3=95 >>> wrote: >>> > > > >> > >>> > > > >> > > Thank you St.Ack! >>> > > > >> > > >>> > > > >> > > I would like to follow the ticket. >>> > > > >> > > >>> > > > >> > > Toshihiro Suzuki >>> > > > >> > > >>> > > > >> > > 2015-09-22 14:14 GMT+09:00 Stack : >>> > > > >> > > >>> > > > >> > > > Back to this problem. Simple tests confirm that as is, t= he >>> > > > >> > > > single-queue-backed MVCC instance can slow Region ops if >>> some >>> > > > other >>> > > > >> row >>> > > > >> > > is >>> > > > >> > > > slow to complete. In particular Increment, checkAndPut, >>> and >>> > > batch >>> > > > >> > > mutations >>> > > > >> > > > are effected. I opened HBASE-14460 to start in on a fix >>> up. >>> > Lets >>> > > > >> see if >>> > > > >> > > we >>> > > > >> > > > can somehow scope mvcc to row or at least shard mvcc so >>> not >>> > all >>> > > > >> Region >>> > > > >> > > ops >>> > > > >> > > > are paused. >>> > > > >> > > > >>> > > > >> > > > St.Ack >>> > > > >> > > > >>> > > > >> > > > >>> > > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, =E9=88=B4=E6=9C=A8=E4= =BF=8A=E8=A3=95 >> > >>> > > wrote: >>> > > > >> > > > >>> > > > >> > > > > > Thank you for the below reasoning (with accompanying >>> > helpful >>> > > > >> > > diagram). >>> > > > >> > > > > > Makes sense. Let me hack up a test case to help with >>> the >>> > > > >> > > illustration. >>> > > > >> > > > It >>> > > > >> > > > > > is as though the mvcc should be scoped to a row >>> only... >>> > > Writes >>> > > > >> > > against >>> > > > >> > > > > > other rows should not hold up my read of my row. Tag >>> an >>> > mvcc >>> > > > >> with a >>> > > > >> > > > 'row' >>> > > > >> > > > > > scope so we can see which on-going writes pertain to >>> > current >>> > > > >> > > operation? >>> > > > >> > > > > Thank you St.Ack! I think this approach would work. >>> > > > >> > > > > >>> > > > >> > > > > > You need to read back the increment and have it be >>> > 'correct' >>> > > > at >>> > > > >> > > > increment >>> > > > >> > > > > > time? >>> > > > >> > > > > Yes, we need it. >>> > > > >> > > > > >>> > > > >> > > > > I would like to help if there is anything I can do. >>> > > > >> > > > > >>> > > > >> > > > > Thanks, >>> > > > >> > > > > Toshihiro Suzuki >>> > > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack : >>> > > > >> > > > > >>> > > > >> > > > > > Thank you for the below reasoning (with accompanying >>> > helpful >>> > > > >> > > diagram). >>> > > > >> > > > > > Makes sense. Let me hack up a test case to help with >>> the >>> > > > >> > > illustration. >>> > > > >> > > > It >>> > > > >> > > > > > is as though the mvcc should be scoped to a row >>> only... >>> > > Writes >>> > > > >> > > against >>> > > > >> > > > > > other rows should not hold up my read of my row. Tag >>> an >>> > mvcc >>> > > > >> with a >>> > > > >> > > > 'row' >>> > > > >> > > > > > scope so we can see which on-going writes pertain to >>> > current >>> > > > >> > > operation? >>> > > > >> > > > > > >>> > > > >> > > > > > You need to read back the increment and have it be >>> > 'correct' >>> > > > at >>> > > > >> > > > increment >>> > > > >> > > > > > time? >>> > > > >> > > > > > >>> > > > >> > > > > > (This is a good one) >>> > > > >> > > > > > >>> > > > >> > > > > > Thank you Toshihiro Suzuki >>> > > > >> > > > > > St.Ack >>> > > > >> > > > > > >>> > > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, =E9=88=B4=E6=9C=A8= =E4=BF=8A=E8=A3=95 < >>> brfrn169@gmail.com >>> > > >>> > > > >> wrote: >>> > > > >> > > > > > >>> > > > >> > > > > > > St.Ack, >>> > > > >> > > > > > > >>> > > > >> > > > > > > Thank you for your response. >>> > > > >> > > > > > > >>> > > > >> > > > > > > Why I make out that "A region lock (not a row lock= ) >>> > seems >>> > > to >>> > > > >> > occur >>> > > > >> > > in >>> > > > >> > > > > > > waitForPreviousTransactionsComplete()" is as >>> follows: >>> > > > >> > > > > > > >>> > > > >> > > > > > > A increment operation has 3 procedures for MVCC. >>> > > > >> > > > > > > >>> > > > >> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete(); >>> > > > >> > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > >>> > > > >> > > > > >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > >>> > > > >> >>> > > > >>> > > >>> > >>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/sr= c/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712 >>> > > > >> > > > > > > >>> > > > >> > > > > > > 2. w =3D mvcc.beginMemstoreInsertWithSeqNum(mvccNu= m); >>> > > > >> > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > >>> > > > >> > > > > >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > >>> > > > >> >>> > > > >>> > > >>> > >>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/sr= c/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721 >>> > > > >> > > > > > > >>> > > > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey= ); >>> > > > >> > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > >>> > > > >> > > > > >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > >>> > > > >> >>> > > > >>> > > >>> > >>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/sr= c/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893 >>> > > > >> > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > > I think that MultiVersionConsistencyControl's >>> writeQueue >>> > > can >>> > > > >> > cause >>> > > > >> > > a >>> > > > >> > > > > > region >>> > > > >> > > > > > > lock. >>> > > > >> > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > >>> > > > >> > > > > >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > >>> > > > >> >>> > > > >>> > > >>> > >>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/sr= c/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyCon= trol.java#L42-L43 >>> > > > >> > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue. >>> > > > >> > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > >>> > > > >> > > > > >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > >>> > > > >> >>> > > > >>> > > >>> > >>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/sr= c/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyCon= trol.java#L102-L108 >>> > > > >> > > > > > > >>> > > > >> > > > > > > Step 3 removes the WriteEntry from writeQueue. >>> > > > >> > > > > > > >>> > > > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) -> >>> > > > >> > > > > > > waitForPreviousTransactionsComplete(e) -> >>> > > advanceMemstore(w) >>> > > > >> > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > >>> > > > >> > > > > >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > >>> > > > >> >>> > > > >>> > > >>> > >>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/sr= c/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyCon= trol.java#L127 >>> > > > >> > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > >>> > > > >> > > > > >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > >>> > > > >> >>> > > > >>> > > >>> > >>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/sr= c/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyCon= trol.java#L235 >>> > > > >> > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > >>> > > > >> > > > > >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > >>> > > > >> >>> > > > >>> > > >>> > >>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/sr= c/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyCon= trol.java#L160 >>> > > > >> > > > > > > >>> > > > >> > > > > > > Step 1 adds a WriteEntry w in beginMemstoreInsert(= ) >>> to >>> > > > >> writeQueue >>> > > > >> > > and >>> > > > >> > > > > > waits >>> > > > >> > > > > > > until writeQueue is empty or writeQueue.getFirst() >>> =3D=3D w. >>> > > > >> > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > >>> > > > >> > > > > >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > >>> > > > >> >>> > > > >>> > > >>> > >>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/sr= c/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyCon= trol.java#L201-L204 >>> > > > >> > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > >>> > > > >> > > > > >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > >>> > > > >> >>> > > > >>> > > >>> > >>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/sr= c/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyCon= trol.java#L206-L241 >>> > > > >> > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > > I think when a handler thread is processing betwee= n >>> > step 2 >>> > > > and >>> > > > >> > step >>> > > > >> > > > 3, >>> > > > >> > > > > > the >>> > > > >> > > > > > > other handler threads can wait at step 1 until the >>> > thread >>> > > > >> > completes >>> > > > >> > > > > step >>> > > > >> > > > > > 3 >>> > > > >> > > > > > > This is depicted as follows: >>> > > > >> > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > >>> > > > >> > > > > >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > >>> > > > >> >>> > > > >>> > > >>> > >>> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86= d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png >>> > > > >> > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > > Actually, in the thread dump of our region server, >>> many >>> > > > >> handler >>> > > > >> > > > threads >>> > > > >> > > > > > > (RW.default.writeRpcServer.handler) wait at Step 1 >>> > > > >> > > > > > > (waitForPreviousTransactionsComplete()). >>> > > > >> > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > >>> > > > >> > > > > >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > >>> > > > >> >>> > > > >>> > > >>> > >>> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86= d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt >>> > > > >> > > > > > > >>> > > > >> > > > > > > Many handler threads wait at this: >>> > > > >> > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > >>> > > > >> > > > > >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > >>> > > > >> >>> > > > >>> > > >>> > >>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/sr= c/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyCon= trol.java#L224 >>> > > > >> > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > > > Is it possible you are contending on a counter >>> > > > post-upgrade? >>> > > > >> > Is >>> > > > >> > > it >>> > > > >> > > > > > > > possible that all these threads are trying to ge= t >>> to >>> > the >>> > > > >> same >>> > > > >> > row >>> > > > >> > > > to >>> > > > >> > > > > > > update >>> > > > >> > > > > > > > it? Could the app behavior have changed? Or are >>> you >>> > > > >> thinking >>> > > > >> > > > > increment >>> > > > >> > > > > > > > itself has slowed significantly? >>> > > > >> > > > > > > We have just upgraded HBase, not changed the app >>> > behavior. >>> > > > We >>> > > > >> are >>> > > > >> > > > > > thinking >>> > > > >> > > > > > > increment itself has slowed significantly. >>> > > > >> > > > > > > Before upgrading HBase, it was good throughput and >>> > > latency. >>> > > > >> > > > > > > Currently, to cope with this problem, we split the >>> > regions >>> > > > >> > finely. >>> > > > >> > > > > > > >>> > > > >> > > > > > > Thanks, >>> > > > >> > > > > > > >>> > > > >> > > > > > > Toshihiro Suzuki >>> > > > >> > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack >> >: >>> > > > >> > > > > > > >>> > > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, =E9=88=B4=E6=9C= =A8=E4=BF=8A=E8=A3=95 < >>> > > brfrn169@gmail.com >>> > > > > >>> > > > >> > > wrote: >>> > > > >> > > > > > > > >>> > > > >> > > > > > > > > Ted, >>> > > > >> > > > > > > > > >>> > > > >> > > > > > > > > Thank you for your response. >>> > > > >> > > > > > > > > >>> > > > >> > > > > > > > > I uploaded the complete stack trace to Gist. >>> > > > >> > > > > > > > > >>> > > > >> > > > > > > > > >>> > https://gist.github.com/brfrn169/cb4f2c157129330cd932 >>> > > > >> > > > > > > > > >>> > > > >> > > > > > > > > >>> > > > >> > > > > > > > > I think that increment operation works as >>> follows: >>> > > > >> > > > > > > > > >>> > > > >> > > > > > > > > 1. get row lock >>> > > > >> > > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete() = // >>> > wait >>> > > > for >>> > > > >> all >>> > > > >> > > > prior >>> > > > >> > > > > > > MVCC >>> > > > >> > > > > > > > > transactions to finish >>> > > > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() // >>> start a >>> > > > >> > transaction >>> > > > >> > > > > > > > > 4. get previous values >>> > > > >> > > > > > > > > 5. create KVs >>> > > > >> > > > > > > > > 6. write to Memstore >>> > > > >> > > > > > > > > 7. write to WAL >>> > > > >> > > > > > > > > 8. release row lock >>> > > > >> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() // >>> > complete >>> > > > the >>> > > > >> > > > > > transaction >>> > > > >> > > > > > > > > >>> > > > >> > > > > > > > > A instance of MultiVersionConsistencyControl >>> has a >>> > > > pending >>> > > > >> > > queue >>> > > > >> > > > of >>> > > > >> > > > > > > > writes >>> > > > >> > > > > > > > > named writeQueue. >>> > > > >> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue and >>> waits >>> > > until >>> > > > >> > > > writeQueue >>> > > > >> > > > > > is >>> > > > >> > > > > > > > > empty or writeQueue.getFirst() =3D=3D w. >>> > > > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and ste= p >>> 9 >>> > > > removes >>> > > > >> the >>> > > > >> > > > > > > WriteEntry >>> > > > >> > > > > > > > > from writeQueue. >>> > > > >> > > > > > > > > >>> > > > >> > > > > > > > > I think that when a handler thread is processi= ng >>> > > between >>> > > > >> > step 2 >>> > > > >> > > > and >>> > > > >> > > > > > > step >>> > > > >> > > > > > > > 9, >>> > > > >> > > > > > > > > the other handler threads can wait until the >>> thread >>> > > > >> completes >>> > > > >> > > > step >>> > > > >> > > > > 9. >>> > > > >> > > > > > > > > >>> > > > >> > > > > > > > > >>> > > > >> > > > > > > > That is right. We need to read, after all >>> outstanding >>> > > > >> updates >>> > > > >> > are >>> > > > >> > > > > > done... >>> > > > >> > > > > > > > because we need to read the latest update before >>> we go >>> > > to >>> > > > >> > > > > > > modify/increment >>> > > > >> > > > > > > > it. >>> > > > >> > > > > > > > >>> > > > >> > > > > > > > How do you make out this? >>> > > > >> > > > > > > > >>> > > > >> > > > > > > > "A region lock (not a row lock) seems to occur i= n >>> > > > >> > > > > > > > waitForPreviousTransactionsComplete()." >>> > > > >> > > > > > > > >>> > > > >> > > > > > > > In 0.98.x we did this: >>> > > > >> > > > > > > > >>> > > > >> > > > > > > > >>> > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert()); >>> > > > >> > > > > > > > >>> > > > >> > > > > > > > ... and in 1.0 we do this: >>> > > > >> > > > > > > > >>> > > > >> > > > > > > > mvcc.waitForPreviousTransactionsComplete() which >>> is >>> > > > this.... >>> > > > >> > > > > > > > >>> > > > >> > > > > > > > + public void >>> waitForPreviousTransactionsComplete() { >>> > > > >> > > > > > > > + WriteEntry w =3D beginMemstoreInsert(); >>> > > > >> > > > > > > > + waitForPreviousTransactionsComplete(w); >>> > > > >> > > > > > > > + } >>> > > > >> > > > > > > > >>> > > > >> > > > > > > > The mvcc and region sequenceid were merged in 1.= 0 >>> ( >>> > > > >> > > > > > > > https://issues.apache.org/jira/browse/HBASE-8763 >>> ). >>> > > > Previous >>> > > > >> > mvcc >>> > > > >> > > > and >>> > > > >> > > > > > > > region >>> > > > >> > > > > > > > sequenceid would spin independent of each other. >>> > Perhaps >>> > > > >> this >>> > > > >> > > > > > responsible >>> > > > >> > > > > > > > for some slow down. >>> > > > >> > > > > > > > >>> > > > >> > > > > > > > That said, looking in your thread dump, we seem >>> to be >>> > > down >>> > > > >> in >>> > > > >> > the >>> > > > >> > > > > Get. >>> > > > >> > > > > > If >>> > > > >> > > > > > > > you do a bunch of thread dumps in a row, where i= s >>> the >>> > > > >> > > lock-holding >>> > > > >> > > > > > > thread? >>> > > > >> > > > > > > > In Get or writing Increment... or waiting on >>> sequence >>> > > id? >>> > > > >> > > > > > > > >>> > > > >> > > > > > > > Is it possible you are contending on a counter >>> > > > post-upgrade? >>> > > > >> > Is >>> > > > >> > > it >>> > > > >> > > > > > > > possible that all these threads are trying to ge= t >>> to >>> > the >>> > > > >> same >>> > > > >> > row >>> > > > >> > > > to >>> > > > >> > > > > > > update >>> > > > >> > > > > > > > it? Could the app behavior have changed? Or are >>> you >>> > > > >> thinking >>> > > > >> > > > > increment >>> > > > >> > > > > > > > itself has slowed significantly? >>> > > > >> > > > > > > > >>> > > > >> > > > > > > > St.Ack >>> > > > >> > > > > > > > >>> > > > >> > > > > > > > >>> > > > >> > > > > > > > >>> > > > >> > > > > > > > >>> > > > >> > > > > > > > > Thanks, >>> > > > >> > > > > > > > > >>> > > > >> > > > > > > > > Toshihiro Suzuki >>> > > > >> > > > > > > > > >>> > > > >> > > > > > > > > >>> > > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu < >>> > yuzhihong@gmail.com >>> > > >: >>> > > > >> > > > > > > > > >>> > > > >> > > > > > > > > > In HRegion#increment(), we lock the row (not >>> > > region): >>> > > > >> > > > > > > > > > >>> > > > >> > > > > > > > > > try { >>> > > > >> > > > > > > > > > rowLock =3D getRowLock(row); >>> > > > >> > > > > > > > > > >>> > > > >> > > > > > > > > > Can you pastebin the complete stack trace ? >>> > > > >> > > > > > > > > > >>> > > > >> > > > > > > > > > Thanks >>> > > > >> > > > > > > > > > >>> > > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, =E9=88=B4=E6= =9C=A8=E4=BF=8A=E8=A3=95 < >>> > > > >> brfrn169@gmail.com> >>> > > > >> > > > wrote: >>> > > > >> > > > > > > > > > >>> > > > >> > > > > > > > > > > Hi, >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > > We upgraded our cluster from >>> > CDH5.3.1(HBase0.98.6) >>> > > > to >>> > > > >> > > > > > > > > > CDH5.4.5(HBase1.0.0) >>> > > > >> > > > > > > > > > > and we experience slowdown in increment >>> > operation. >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > > Here's an extract from thread dump of the >>> > > > >> RegionServer of >>> > > > >> > > our >>> > > > >> > > > > > > > cluster: >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > > Thread 68 >>> > > > >> > > > > > > >>> > (RW.default.writeRpcServer.handler=3D15,queue=3D5,port=3D60020): >>> > > > >> > > > > > > > > > > State: BLOCKED >>> > > > >> > > > > > > > > > > Blocked count: 21689888 >>> > > > >> > > > > > > > > > > Waited count: 39828360 >>> > > > >> > > > > > > > > > > Blocked on java.util.LinkedList@3474e4b2 >>> > > > >> > > > > > > > > > > Blocked by 63 >>> > > > >> > > > > > > > > >>> > > > (RW.default.writeRpcServer.handler=3D10,queue=3D0,port=3D60020) >>> > > > >> > > > > > > > > > > Stack: >>> > > > >> > > > > > > > > > > java.lang.Object.wait(Native Method) >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > >>> > > > >> > > > > > > > > >>> > > > >> > > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > >>> > > > >> > > > > >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > >>> > > > >> >>> > > > >>> > > >>> > >>> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.wai= tForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224) >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > >>> > > > >> > > > > > > > > >>> > > > >> > > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > >>> > > > >> > > > > >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > >>> > > > >> >>> > > > >>> > > >>> > >>> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.wai= tForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203) >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > >>> > > > >> > > >>> > > > >> >>> > > > >>> > >>> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:671= 2) >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > >>> > > > >> > > > > > > > > >>> > > > >> > > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > >>> > > > >> > > > > >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > >>> > > > >> >>> > > > >>> > > >>> > >>> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServi= ces.java:501) >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > >>> > > > >> > > > > > > > > >>> > > > >> > > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > >>> > > > >> > > > > >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > >>> > > > >> >>> > > > >>> > > >>> > >>> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMut= ation(RSRpcServices.java:570) >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > >>> > > > >> > > > > > > > > >>> > > > >> > > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > >>> > > > >> > > > > >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > >>> > > > >> >>> > > > >>> > > >>> > >>> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.= java:1901) >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > >>> > > > >> > > > > > > > > >>> > > > >> > > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > >>> > > > >> > > > > >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > >>> > > > >> >>> > > > >>> > > >>> > >>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2= .callBlockingMethod(ClientProtos.java:31451) >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > >>> > > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035) >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > >>> > > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107) >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > >>> > > > >> > > > > > > > > >>> > > > >> > > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > >>> > > > >> > > > > >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > >>> > > > >> >>> > > > >>> > > >>> > >>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:1= 30) >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > >>> > > > >> > > > >>> > > > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:10= 7) >>> > > > >> > > > > > > > > > > java.lang.Thread.run(Thread.java:745) >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > > There are many similar threads in the thre= ad >>> > dump. >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > > I read the source code and I think this is >>> > caused >>> > > by >>> > > > >> > > changes >>> > > > >> > > > of >>> > > > >> > > > > > > > > > > MultiVersionConsistencyControl. >>> > > > >> > > > > > > > > > > A region lock (not a row lock) seems to >>> occur in >>> > > > >> > > > > > > > > > > waitForPreviousTransactionsComplete(). >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > > Also we wrote performance test code for >>> > increment >>> > > > >> > operation >>> > > > >> > > > > that >>> > > > >> > > > > > > > > included >>> > > > >> > > > > > > > > > > 100 threads and ran it in local mode. >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > > The result is shown below: >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6) >>> > > > >> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms): >>> > > > >> 7.975072509210629 >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0) >>> > > > >> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms): >>> > > > 49.11840157868772 >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > > Thanks, >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > > Toshihiro Suzuki >>> > > > >> > > > > > > > > > > >>> > > > >> > > > > > > > > > >>> > > > >> > > > > > > > > >>> > > > >> > > > > > > > >>> > > > >> > > > > > > >>> > > > >> > > > > > >>> > > > >> > > > > >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > >>> > > > >> >>> > > > > >>> > > > >>> > > >>> > >>> >> --001a113ce122f6424d0525cbaca7--