Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 67A9A18D96 for ; Mon, 30 Nov 2015 23:12:15 +0000 (UTC) Received: (qmail 18122 invoked by uid 500); 30 Nov 2015 23:12:13 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 18041 invoked by uid 500); 30 Nov 2015 23:12:13 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 18015 invoked by uid 99); 30 Nov 2015 23:12:13 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Nov 2015 23:12:13 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 89D00C57C6 for ; Mon, 30 Nov 2015 23:12:12 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.252 X-Spam-Level: *** X-Spam-Status: No, score=3.252 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=3, KAM_BODY_MARKETINGBL_PCCC=0.001, KAM_LOTSOFHASH=0.25, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=hubspot-com.20150623.gappssmtp.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id mOcFumzLoi5F for ; Mon, 30 Nov 2015 23:11:59 +0000 (UTC) Received: from mail-ob0-f172.google.com (mail-ob0-f172.google.com [209.85.214.172]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 332F0215CF for ; Mon, 30 Nov 2015 23:11:59 +0000 (UTC) Received: by obbbj7 with SMTP id bj7so140650425obb.1 for ; Mon, 30 Nov 2015 15:11:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hubspot-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-type; bh=KG7013kWQMa235ot32d7iphQTJv2XeEG+Xe1q/vSfL0=; b=oXojhAMBHLdVnXztuWM2++PvWXRnS5Bgop/W00OSV+3LBXGolt8FCcNf4Lwzr6vY3E rIFfTVdCIZXLWP44GYjN4snQluBhTUPLjgus776gIiXlP0cDntvC6AEKk5q2XI53APHi vfrYFC6hodSz+v00nhL8O7GJ7lgWFw/E3sl9XNRAG+k23MI1df5hOOELosRCd4jR9n+K 5hj2RpB68Fa+/u9oCDCIjUHKj2R1GiypLbKH9aIg9892EWaPbf+rHncZPbP3G/jD/JUH WjOh+7W72E5Qdv/BSlioJaL0MR0Oek4cYJkkLu3ZFkhRnWU5oIhj711A3+wTDQKzKuLK 9TnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-type; bh=KG7013kWQMa235ot32d7iphQTJv2XeEG+Xe1q/vSfL0=; b=OnQUk9FTm7u6UKGXgK/9GJ/HmGsRzWbANitGG8S5IdKVaiXo+6EZvMRZtGT/nn620I 2VjMoMkfE46Bh4ZSne7EGFbtFnpsdFY3xeDE7rhTY9BitdeuPlJvuY1DKOyua2HGmyva MiyxoOGmqawifRZcM+ggEsNecGv+sG7HmjeHa68HSC48s+Z+oEnOVMPtElFNy3MFHfxK uAhFl6UtMA/4hYcSk+TFf4/KgVGl6ptqEUonUbmjShGVmHhssa1ITTQaV6e49DZ4tkdy g8YKraqp68TbhwcPBWHcq2eaGurK/juWdnk88/uKDFWuQZgTZ4dFi61yuUQYDDBVAeUb nhSw== X-Gm-Message-State: ALoCoQlqUUPJUbr9p/95SG1F4YSlxooito30xMgkG+xNmOzaEKkpE6sSS5xdxH1jRZR40x4YK2KP X-Received: by 10.60.92.138 with SMTP id cm10mr45261712oeb.64.1448925118461; Mon, 30 Nov 2015 15:11:58 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Bryan Beaudreault Date: Mon, 30 Nov 2015 23:11:49 +0000 Message-ID: Subject: Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0) To: HBase Dev List Cc: Hbase-User Content-Type: multipart/alternative; boundary=047d7b33d418fe0c8e0525ca2a53 --047d7b33d418fe0c8e0525ca2a53 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Sorry the second link should be https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-tx= t-L579 On Mon, Nov 30, 2015 at 6:10 PM Bryan Beaudreault wrote: > https://gist.github.com/bbeaudreault/2994a748da83d9f75085 > > An active handler: > https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-= txt-L286 > One that is locked: > https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfile= 1-txt-L579 > > The difference between pre-rollback and post is that previously we were > seeing things blocked in mvcc. Now we are seeing them blocked on the > upsert. > > It always follows the same pattern, of 1 active handler in the upsert and > the rest blocked waiting for it. > > On Mon, Nov 30, 2015 at 6:05 PM Stack wrote: > >> On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault < >> bbeaudreault@hubspot.com >> > wrote: >> >> > The rollback seems to have mostly solved the issue for one of our >> clusters, >> > but another one is still seeing long increment times: >> > >> > "slowIncrementCount": 52080, >> > "Increment_num_ops": 325236,"Increment_min": 1,"Increment_max": 6162," >> > Increment_mean": 465.68678129112396,"Increment_median": 216," >> > Increment_75th_percentile": 450.25,"Increment_95th_percentile": >> > 1052.6499999999999,"Increment_99th_percentile": 1635.2399999999998 >> > >> > >> > Any ideas if there are other changes that may be causing a performance >> > regression for increments between CDH4.7.1 and CDH5.3.8? >> > >> > >> > >> No. >> >> Post a thread dump Bryan and it might prompt something. >> >> St.Ack >> >> >> >> >> > >> > On Mon, Nov 30, 2015 at 4:13 PM Stack wrote: >> > >> > > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault < >> > > bbeaudreault@hubspot.com> wrote: >> > > >> > > > Should this be added as a known issue in the CDH or hbase >> > documentation? >> > > It >> > > > was a severe performance hit for us, all of our regionservers were >> > > sitting >> > > > at a few thousand queued requests. >> > > > >> > > > >> > > Let me take care of that. >> > > St.Ack >> > > >> > > >> > > >> > > > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault < >> > > > bbeaudreault@hubspot.com> >> > > > wrote: >> > > > >> > > > > Yea, they are all over the place and called from client and >> > coprocessor >> > > > > code. We ended up having no other option but to rollback, and >> aside >> > > from >> > > > a >> > > > > few NoSuchMethodErrors due to API changes (Put#add vs >> Put#addColumn), >> > > it >> > > > > seems to be working and fixing our problem. >> > > > > >> > > > > On Mon, Nov 30, 2015 at 3:47 PM Stack wrote: >> > > > > >> > > > >> Rollback is untested. No fix in 5.5. I was going to work on thi= s >> > now. >> > > > >> Where >> > > > >> are your counters Bryan? In their own column family or scattere= d >> > about >> > > > in >> > > > >> a >> > > > >> row with other Cell types? >> > > > >> St.Ack >> > > > >> >> > > > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault < >> > > > >> bbeaudreault@hubspot.com> wrote: >> > > > >> >> > > > >> > Is there any update to this? We just upgraded all of our >> > production >> > > > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA >> listed in >> > > the >> > > > >> > known issues, did not not about this. Now we are seeing >> > perfomance >> > > > >> issues >> > > > >> > across all clusters, as we make heavy use of increments. >> > > > >> > >> > > > >> > Can we roll forward to CDH5.5 to fix? Or is our only hope to >> roll >> > > back >> > > > >> to >> > > > >> > CDH 5.3.1 (if that is possible)? >> > > > >> > >> > > > >> > >> > > > >> > On Thu, Sep 24, 2015 at 5:06 AM =E9=88=B4=E6=9C=A8=E4=BF=8A= =E8=A3=95 >> wrote: >> > > > >> > >> > > > >> > > Thank you St.Ack! >> > > > >> > > >> > > > >> > > I would like to follow the ticket. >> > > > >> > > >> > > > >> > > Toshihiro Suzuki >> > > > >> > > >> > > > >> > > 2015-09-22 14:14 GMT+09:00 Stack : >> > > > >> > > >> > > > >> > > > Back to this problem. Simple tests confirm that as is, th= e >> > > > >> > > > single-queue-backed MVCC instance can slow Region ops if >> some >> > > > other >> > > > >> row >> > > > >> > > is >> > > > >> > > > slow to complete. In particular Increment, checkAndPut, a= nd >> > > batch >> > > > >> > > mutations >> > > > >> > > > are effected. I opened HBASE-14460 to start in on a fix u= p. >> > Lets >> > > > >> see if >> > > > >> > > we >> > > > >> > > > can somehow scope mvcc to row or at least shard mvcc so n= ot >> > all >> > > > >> Region >> > > > >> > > ops >> > > > >> > > > are paused. >> > > > >> > > > >> > > > >> > > > St.Ack >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, =E9=88=B4=E6=9C=A8=E4=BF= =8A=E8=A3=95 >> > > wrote: >> > > > >> > > > >> > > > >> > > > > > Thank you for the below reasoning (with accompanying >> > helpful >> > > > >> > > diagram). >> > > > >> > > > > > Makes sense. Let me hack up a test case to help with >> the >> > > > >> > > illustration. >> > > > >> > > > It >> > > > >> > > > > > is as though the mvcc should be scoped to a row only.= .. >> > > Writes >> > > > >> > > against >> > > > >> > > > > > other rows should not hold up my read of my row. Tag = an >> > mvcc >> > > > >> with a >> > > > >> > > > 'row' >> > > > >> > > > > > scope so we can see which on-going writes pertain to >> > current >> > > > >> > > operation? >> > > > >> > > > > Thank you St.Ack! I think this approach would work. >> > > > >> > > > > >> > > > >> > > > > > You need to read back the increment and have it be >> > 'correct' >> > > > at >> > > > >> > > > increment >> > > > >> > > > > > time? >> > > > >> > > > > Yes, we need it. >> > > > >> > > > > >> > > > >> > > > > I would like to help if there is anything I can do. >> > > > >> > > > > >> > > > >> > > > > Thanks, >> > > > >> > > > > Toshihiro Suzuki >> > > > >> > > > > >> > > > >> > > > > >> > > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack : >> > > > >> > > > > >> > > > >> > > > > > Thank you for the below reasoning (with accompanying >> > helpful >> > > > >> > > diagram). >> > > > >> > > > > > Makes sense. Let me hack up a test case to help with >> the >> > > > >> > > illustration. >> > > > >> > > > It >> > > > >> > > > > > is as though the mvcc should be scoped to a row only.= .. >> > > Writes >> > > > >> > > against >> > > > >> > > > > > other rows should not hold up my read of my row. Tag = an >> > mvcc >> > > > >> with a >> > > > >> > > > 'row' >> > > > >> > > > > > scope so we can see which on-going writes pertain to >> > current >> > > > >> > > operation? >> > > > >> > > > > > >> > > > >> > > > > > You need to read back the increment and have it be >> > 'correct' >> > > > at >> > > > >> > > > increment >> > > > >> > > > > > time? >> > > > >> > > > > > >> > > > >> > > > > > (This is a good one) >> > > > >> > > > > > >> > > > >> > > > > > Thank you Toshihiro Suzuki >> > > > >> > > > > > St.Ack >> > > > >> > > > > > >> > > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, =E9=88=B4=E6=9C=A8= =E4=BF=8A=E8=A3=95 < >> brfrn169@gmail.com >> > > >> > > > >> wrote: >> > > > >> > > > > > >> > > > >> > > > > > > St.Ack, >> > > > >> > > > > > > >> > > > >> > > > > > > Thank you for your response. >> > > > >> > > > > > > >> > > > >> > > > > > > Why I make out that "A region lock (not a row lock) >> > seems >> > > to >> > > > >> > occur >> > > > >> > > in >> > > > >> > > > > > > waitForPreviousTransactionsComplete()" is as follow= s: >> > > > >> > > > > > > >> > > > >> > > > > > > A increment operation has 3 procedures for MVCC. >> > > > >> > > > > > > >> > > > >> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete(); >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > >> > > > >> > > > > >> > > > >> > > > >> > > > >> > > >> > > > >> > >> > > > >> >> > > > >> > > >> > >> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src= /main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712 >> > > > >> > > > > > > >> > > > >> > > > > > > 2. w =3D mvcc.beginMemstoreInsertWithSeqNum(mvccNum= ); >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > >> > > > >> > > > > >> > > > >> > > > >> > > > >> > > >> > > > >> > >> > > > >> >> > > > >> > > >> > >> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src= /main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721 >> > > > >> > > > > > > >> > > > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey)= ; >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > >> > > > >> > > > > >> > > > >> > > > >> > > > >> > > >> > > > >> > >> > > > >> >> > > > >> > > >> > >> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src= /main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893 >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > > I think that MultiVersionConsistencyControl's >> writeQueue >> > > can >> > > > >> > cause >> > > > >> > > a >> > > > >> > > > > > region >> > > > >> > > > > > > lock. >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > >> > > > >> > > > > >> > > > >> > > > >> > > > >> > > >> > > > >> > >> > > > >> >> > > > >> > > >> > >> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src= /main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyCont= rol.java#L42-L43 >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue. >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > >> > > > >> > > > > >> > > > >> > > > >> > > > >> > > >> > > > >> > >> > > > >> >> > > > >> > > >> > >> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src= /main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyCont= rol.java#L102-L108 >> > > > >> > > > > > > >> > > > >> > > > > > > Step 3 removes the WriteEntry from writeQueue. >> > > > >> > > > > > > >> > > > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) -> >> > > > >> > > > > > > waitForPreviousTransactionsComplete(e) -> >> > > advanceMemstore(w) >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > >> > > > >> > > > > >> > > > >> > > > >> > > > >> > > >> > > > >> > >> > > > >> >> > > > >> > > >> > >> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src= /main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyCont= rol.java#L127 >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > >> > > > >> > > > > >> > > > >> > > > >> > > > >> > > >> > > > >> > >> > > > >> >> > > > >> > > >> > >> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src= /main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyCont= rol.java#L235 >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > >> > > > >> > > > > >> > > > >> > > > >> > > > >> > > >> > > > >> > >> > > > >> >> > > > >> > > >> > >> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src= /main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyCont= rol.java#L160 >> > > > >> > > > > > > >> > > > >> > > > > > > Step 1 adds a WriteEntry w in beginMemstoreInsert() >> to >> > > > >> writeQueue >> > > > >> > > and >> > > > >> > > > > > waits >> > > > >> > > > > > > until writeQueue is empty or writeQueue.getFirst() >> =3D=3D w. >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > >> > > > >> > > > > >> > > > >> > > > >> > > > >> > > >> > > > >> > >> > > > >> >> > > > >> > > >> > >> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src= /main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyCont= rol.java#L201-L204 >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > >> > > > >> > > > > >> > > > >> > > > >> > > > >> > > >> > > > >> > >> > > > >> >> > > > >> > > >> > >> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src= /main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyCont= rol.java#L206-L241 >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > > I think when a handler thread is processing between >> > step 2 >> > > > and >> > > > >> > step >> > > > >> > > > 3, >> > > > >> > > > > > the >> > > > >> > > > > > > other handler threads can wait at step 1 until the >> > thread >> > > > >> > completes >> > > > >> > > > > step >> > > > >> > > > > > 3 >> > > > >> > > > > > > This is depicted as follows: >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > >> > > > >> > > > > >> > > > >> > > > >> > > > >> > > >> > > > >> > >> > > > >> >> > > > >> > > >> > >> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d= 6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > > Actually, in the thread dump of our region server, >> many >> > > > >> handler >> > > > >> > > > threads >> > > > >> > > > > > > (RW.default.writeRpcServer.handler) wait at Step 1 >> > > > >> > > > > > > (waitForPreviousTransactionsComplete()). >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > >> > > > >> > > > > >> > > > >> > > > >> > > > >> > > >> > > > >> > >> > > > >> >> > > > >> > > >> > >> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d= 6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt >> > > > >> > > > > > > >> > > > >> > > > > > > Many handler threads wait at this: >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > >> > > > >> > > > > >> > > > >> > > > >> > > > >> > > >> > > > >> > >> > > > >> >> > > > >> > > >> > >> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src= /main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyCont= rol.java#L224 >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > > > Is it possible you are contending on a counter >> > > > post-upgrade? >> > > > >> > Is >> > > > >> > > it >> > > > >> > > > > > > > possible that all these threads are trying to get >> to >> > the >> > > > >> same >> > > > >> > row >> > > > >> > > > to >> > > > >> > > > > > > update >> > > > >> > > > > > > > it? Could the app behavior have changed? Or are >> you >> > > > >> thinking >> > > > >> > > > > increment >> > > > >> > > > > > > > itself has slowed significantly? >> > > > >> > > > > > > We have just upgraded HBase, not changed the app >> > behavior. >> > > > We >> > > > >> are >> > > > >> > > > > > thinking >> > > > >> > > > > > > increment itself has slowed significantly. >> > > > >> > > > > > > Before upgrading HBase, it was good throughput and >> > > latency. >> > > > >> > > > > > > Currently, to cope with this problem, we split the >> > regions >> > > > >> > finely. >> > > > >> > > > > > > >> > > > >> > > > > > > Thanks, >> > > > >> > > > > > > >> > > > >> > > > > > > Toshihiro Suzuki >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack = : >> > > > >> > > > > > > >> > > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, =E9=88=B4=E6=9C= =A8=E4=BF=8A=E8=A3=95 < >> > > brfrn169@gmail.com >> > > > > >> > > > >> > > wrote: >> > > > >> > > > > > > > >> > > > >> > > > > > > > > Ted, >> > > > >> > > > > > > > > >> > > > >> > > > > > > > > Thank you for your response. >> > > > >> > > > > > > > > >> > > > >> > > > > > > > > I uploaded the complete stack trace to Gist. >> > > > >> > > > > > > > > >> > > > >> > > > > > > > > >> > https://gist.github.com/brfrn169/cb4f2c157129330cd932 >> > > > >> > > > > > > > > >> > > > >> > > > > > > > > >> > > > >> > > > > > > > > I think that increment operation works as >> follows: >> > > > >> > > > > > > > > >> > > > >> > > > > > > > > 1. get row lock >> > > > >> > > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete() /= / >> > wait >> > > > for >> > > > >> all >> > > > >> > > > prior >> > > > >> > > > > > > MVCC >> > > > >> > > > > > > > > transactions to finish >> > > > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() // star= t >> a >> > > > >> > transaction >> > > > >> > > > > > > > > 4. get previous values >> > > > >> > > > > > > > > 5. create KVs >> > > > >> > > > > > > > > 6. write to Memstore >> > > > >> > > > > > > > > 7. write to WAL >> > > > >> > > > > > > > > 8. release row lock >> > > > >> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() // >> > complete >> > > > the >> > > > >> > > > > > transaction >> > > > >> > > > > > > > > >> > > > >> > > > > > > > > A instance of MultiVersionConsistencyControl ha= s >> a >> > > > pending >> > > > >> > > queue >> > > > >> > > > of >> > > > >> > > > > > > > writes >> > > > >> > > > > > > > > named writeQueue. >> > > > >> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue and >> waits >> > > until >> > > > >> > > > writeQueue >> > > > >> > > > > > is >> > > > >> > > > > > > > > empty or writeQueue.getFirst() =3D=3D w. >> > > > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and step= 9 >> > > > removes >> > > > >> the >> > > > >> > > > > > > WriteEntry >> > > > >> > > > > > > > > from writeQueue. >> > > > >> > > > > > > > > >> > > > >> > > > > > > > > I think that when a handler thread is processin= g >> > > between >> > > > >> > step 2 >> > > > >> > > > and >> > > > >> > > > > > > step >> > > > >> > > > > > > > 9, >> > > > >> > > > > > > > > the other handler threads can wait until the >> thread >> > > > >> completes >> > > > >> > > > step >> > > > >> > > > > 9. >> > > > >> > > > > > > > > >> > > > >> > > > > > > > > >> > > > >> > > > > > > > That is right. We need to read, after all >> outstanding >> > > > >> updates >> > > > >> > are >> > > > >> > > > > > done... >> > > > >> > > > > > > > because we need to read the latest update before >> we go >> > > to >> > > > >> > > > > > > modify/increment >> > > > >> > > > > > > > it. >> > > > >> > > > > > > > >> > > > >> > > > > > > > How do you make out this? >> > > > >> > > > > > > > >> > > > >> > > > > > > > "A region lock (not a row lock) seems to occur in >> > > > >> > > > > > > > waitForPreviousTransactionsComplete()." >> > > > >> > > > > > > > >> > > > >> > > > > > > > In 0.98.x we did this: >> > > > >> > > > > > > > >> > > > >> > > > > > > > >> > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert()); >> > > > >> > > > > > > > >> > > > >> > > > > > > > ... and in 1.0 we do this: >> > > > >> > > > > > > > >> > > > >> > > > > > > > mvcc.waitForPreviousTransactionsComplete() which = is >> > > > this.... >> > > > >> > > > > > > > >> > > > >> > > > > > > > + public void >> waitForPreviousTransactionsComplete() { >> > > > >> > > > > > > > + WriteEntry w =3D beginMemstoreInsert(); >> > > > >> > > > > > > > + waitForPreviousTransactionsComplete(w); >> > > > >> > > > > > > > + } >> > > > >> > > > > > > > >> > > > >> > > > > > > > The mvcc and region sequenceid were merged in 1.0= ( >> > > > >> > > > > > > > https://issues.apache.org/jira/browse/HBASE-8763)= . >> > > > Previous >> > > > >> > mvcc >> > > > >> > > > and >> > > > >> > > > > > > > region >> > > > >> > > > > > > > sequenceid would spin independent of each other. >> > Perhaps >> > > > >> this >> > > > >> > > > > > responsible >> > > > >> > > > > > > > for some slow down. >> > > > >> > > > > > > > >> > > > >> > > > > > > > That said, looking in your thread dump, we seem t= o >> be >> > > down >> > > > >> in >> > > > >> > the >> > > > >> > > > > Get. >> > > > >> > > > > > If >> > > > >> > > > > > > > you do a bunch of thread dumps in a row, where is >> the >> > > > >> > > lock-holding >> > > > >> > > > > > > thread? >> > > > >> > > > > > > > In Get or writing Increment... or waiting on >> sequence >> > > id? >> > > > >> > > > > > > > >> > > > >> > > > > > > > Is it possible you are contending on a counter >> > > > post-upgrade? >> > > > >> > Is >> > > > >> > > it >> > > > >> > > > > > > > possible that all these threads are trying to get >> to >> > the >> > > > >> same >> > > > >> > row >> > > > >> > > > to >> > > > >> > > > > > > update >> > > > >> > > > > > > > it? Could the app behavior have changed? Or are >> you >> > > > >> thinking >> > > > >> > > > > increment >> > > > >> > > > > > > > itself has slowed significantly? >> > > > >> > > > > > > > >> > > > >> > > > > > > > St.Ack >> > > > >> > > > > > > > >> > > > >> > > > > > > > >> > > > >> > > > > > > > >> > > > >> > > > > > > > >> > > > >> > > > > > > > > Thanks, >> > > > >> > > > > > > > > >> > > > >> > > > > > > > > Toshihiro Suzuki >> > > > >> > > > > > > > > >> > > > >> > > > > > > > > >> > > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu < >> > yuzhihong@gmail.com >> > > >: >> > > > >> > > > > > > > > >> > > > >> > > > > > > > > > In HRegion#increment(), we lock the row (not >> > > region): >> > > > >> > > > > > > > > > >> > > > >> > > > > > > > > > try { >> > > > >> > > > > > > > > > rowLock =3D getRowLock(row); >> > > > >> > > > > > > > > > >> > > > >> > > > > > > > > > Can you pastebin the complete stack trace ? >> > > > >> > > > > > > > > > >> > > > >> > > > > > > > > > Thanks >> > > > >> > > > > > > > > > >> > > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, =E9=88=B4=E6= =9C=A8=E4=BF=8A=E8=A3=95 < >> > > > >> brfrn169@gmail.com> >> > > > >> > > > wrote: >> > > > >> > > > > > > > > > >> > > > >> > > > > > > > > > > Hi, >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > > We upgraded our cluster from >> > CDH5.3.1(HBase0.98.6) >> > > > to >> > > > >> > > > > > > > > > CDH5.4.5(HBase1.0.0) >> > > > >> > > > > > > > > > > and we experience slowdown in increment >> > operation. >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > > Here's an extract from thread dump of the >> > > > >> RegionServer of >> > > > >> > > our >> > > > >> > > > > > > > cluster: >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > > Thread 68 >> > > > >> > > > > > > >> > (RW.default.writeRpcServer.handler=3D15,queue=3D5,port=3D60020): >> > > > >> > > > > > > > > > > State: BLOCKED >> > > > >> > > > > > > > > > > Blocked count: 21689888 >> > > > >> > > > > > > > > > > Waited count: 39828360 >> > > > >> > > > > > > > > > > Blocked on java.util.LinkedList@3474e4b2 >> > > > >> > > > > > > > > > > Blocked by 63 >> > > > >> > > > > > > > > >> > > > (RW.default.writeRpcServer.handler=3D10,queue=3D0,port=3D60020) >> > > > >> > > > > > > > > > > Stack: >> > > > >> > > > > > > > > > > java.lang.Object.wait(Native Method) >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > >> > > > >> > > > > > > > > >> > > > >> > > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > >> > > > >> > > > > >> > > > >> > > > >> > > > >> > > >> > > > >> > >> > > > >> >> > > > >> > > >> > >> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.wait= ForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224) >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > >> > > > >> > > > > > > > > >> > > > >> > > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > >> > > > >> > > > > >> > > > >> > > > >> > > > >> > > >> > > > >> > >> > > > >> >> > > > >> > > >> > >> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.wait= ForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203) >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > >> > > > >> > > >> > > > >> >> > > > >> > >> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712= ) >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > >> > > > >> > > > > > > > > >> > > > >> > > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > >> > > > >> > > > > >> > > > >> > > > >> > > > >> > > >> > > > >> > >> > > > >> >> > > > >> > > >> > >> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServic= es.java:501) >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > >> > > > >> > > > > > > > > >> > > > >> > > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > >> > > > >> > > > > >> > > > >> > > > >> > > > >> > > >> > > > >> > >> > > > >> >> > > > >> > > >> > >> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMuta= tion(RSRpcServices.java:570) >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > >> > > > >> > > > > > > > > >> > > > >> > > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > >> > > > >> > > > > >> > > > >> > > > >> > > > >> > > >> > > > >> > >> > > > >> >> > > > >> > > >> > >> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.j= ava:1901) >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > >> > > > >> > > > > > > > > >> > > > >> > > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > >> > > > >> > > > > >> > > > >> > > > >> > > > >> > > >> > > > >> > >> > > > >> >> > > > >> > > >> > >> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.= callBlockingMethod(ClientProtos.java:31451) >> > > > >> > > > > > > > > > > >> > > > >> > > > > > >> > > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035) >> > > > >> > > > > > > > > > > >> > > > >> > > > > > >> > > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107) >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > >> > > > >> > > > > > > > > >> > > > >> > > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > >> > > > >> > > > > >> > > > >> > > > >> > > > >> > > >> > > > >> > >> > > > >> >> > > > >> > > >> > >> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:13= 0) >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > >> > > > >> > > > >> > > > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107= ) >> > > > >> > > > > > > > > > > java.lang.Thread.run(Thread.java:745) >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > > There are many similar threads in the threa= d >> > dump. >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > > I read the source code and I think this is >> > caused >> > > by >> > > > >> > > changes >> > > > >> > > > of >> > > > >> > > > > > > > > > > MultiVersionConsistencyControl. >> > > > >> > > > > > > > > > > A region lock (not a row lock) seems to >> occur in >> > > > >> > > > > > > > > > > waitForPreviousTransactionsComplete(). >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > > Also we wrote performance test code for >> > increment >> > > > >> > operation >> > > > >> > > > > that >> > > > >> > > > > > > > > included >> > > > >> > > > > > > > > > > 100 threads and ran it in local mode. >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > > The result is shown below: >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6) >> > > > >> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms): >> > > > >> 7.975072509210629 >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0) >> > > > >> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms): >> > > > 49.11840157868772 >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > > Thanks, >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > > Toshihiro Suzuki >> > > > >> > > > > > > > > > > >> > > > >> > > > > > > > > > >> > > > >> > > > > > > > > >> > > > >> > > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > >> > > > >> > > > > >> > > > >> > > > >> > > > >> > > >> > > > >> > >> > > > >> >> > > > > >> > > > >> > > >> > >> > --047d7b33d418fe0c8e0525ca2a53--