Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1541318FCA for ; Mon, 30 Nov 2015 18:28:44 +0000 (UTC) Received: (qmail 18026 invoked by uid 500); 30 Nov 2015 18:28:42 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 17962 invoked by uid 500); 30 Nov 2015 18:28:42 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 17946 invoked by uid 99); 30 Nov 2015 18:28:41 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Nov 2015 18:28:41 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 5E93FC25C2 for ; Mon, 30 Nov 2015 18:28:41 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.25 X-Spam-Level: *** X-Spam-Status: No, score=3.25 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=3, KAM_LOTSOFHASH=0.25, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=hubspot-com.20150623.gappssmtp.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id ZWxm6zpQEXVc for ; Mon, 30 Nov 2015 18:28:31 +0000 (UTC) Received: from mail-oi0-f51.google.com (mail-oi0-f51.google.com [209.85.218.51]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 5674021053 for ; Mon, 30 Nov 2015 18:28:30 +0000 (UTC) Received: by oige206 with SMTP id e206so100509896oig.2 for ; Mon, 30 Nov 2015 10:28:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hubspot-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-type; bh=iUUJgrLSg07T1kF/DyGkrPPFC0j/bYHltD0pJXbfST8=; b=pUtadFP0rv2iqQ7/HDEc2ciie/88g/ulPb0U/bJ84aShbXN1oHDIw7TDZRfyHUZA3M zTE2fte8wtSpeMoMRfmZZ4SVJDxQmW9zKZVs72v3ukQwHJu7+TQlXr3JFp5nnwFYD4iX YaomAkZghBwSpZD+dOrEA9YRF9H2cHmbKXCjepGK0d3dt3FzTlkla0KU5elToxPRGJl/ SSO2QbCydXnxUuvfVwiwkPh5ILseQ7YgFRRFykrqGueoqVONIAjHoMIH/kmPR1S06YcC HVJoTjKgAQOMmySYNF2ax8G/f59X298xcbEveqyMN/MynAWpAbcq0Ym4f5sQ1/wZuFNx mC8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-type; bh=iUUJgrLSg07T1kF/DyGkrPPFC0j/bYHltD0pJXbfST8=; b=erE2q5+H1GqBeJVmDO58WKUyS8uUQZhrO9M+lg+U99a+bt34YstUDJQVBUi8JYF8Ua v4XceNaPH4gjcHwufMJyWd6E6z0DOxzt37q+WmkuYjbLW9BCAKg2MakqpBCpmKnWkVE0 aB1cb4cCP/vRCqjaSNCO5PpzGDCvvzOCeH1jhL1EDO4Ck+p4J/HQ766ZdQs5O4B/FleX 89kJGLRVRftniAE6sxcuHV4Iyo+olccO2n5mGfAv63fJ6/DITkiXu6ORYKYRm8LrXmwn 87/N5P0vHt7pfEOrAauaMJ9bIipI50988WglXCC4CeUsmhcBrzCGeeCpcvVM7cdojB4n TiSw== X-Gm-Message-State: ALoCoQnohVI2h6a4PnM6OTTYxn1ad1ctKZlEdzml005+jNAwexfUMCxlU9QsR2l7mW9D1ogRGNR6 X-Received: by 10.202.194.198 with SMTP id s189mr852181oif.71.1448908104208; Mon, 30 Nov 2015 10:28:24 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Bryan Beaudreault Date: Mon, 30 Nov 2015 18:28:14 +0000 Message-ID: Subject: Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0) To: dev@hbase.apache.org, "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=001a113d74f6dd2d480525c63474 --001a113d74f6dd2d480525c63474 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Is there any update to this? We just upgraded all of our production clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA listed in the known issues, did not not about this. Now we are seeing perfomance issues across all clusters, as we make heavy use of increments. Can we roll forward to CDH5.5 to fix? Or is our only hope to roll back to CDH 5.3.1 (if that is possible)? On Thu, Sep 24, 2015 at 5:06 AM =E9=88=B4=E6=9C=A8=E4=BF=8A=E8=A3=95 wrote: > Thank you St.Ack! > > I would like to follow the ticket. > > Toshihiro Suzuki > > 2015-09-22 14:14 GMT+09:00 Stack : > > > Back to this problem. Simple tests confirm that as is, the > > single-queue-backed MVCC instance can slow Region ops if some other row > is > > slow to complete. In particular Increment, checkAndPut, and batch > mutations > > are effected. I opened HBASE-14460 to start in on a fix up. Lets see if > we > > can somehow scope mvcc to row or at least shard mvcc so not all Region > ops > > are paused. > > > > St.Ack > > > > > > On Mon, Sep 14, 2015 at 4:15 AM, =E9=88=B4=E6=9C=A8=E4=BF=8A=E8=A3=95 <= brfrn169@gmail.com> wrote: > > > > > > Thank you for the below reasoning (with accompanying helpful > diagram). > > > > Makes sense. Let me hack up a test case to help with the > illustration. > > It > > > > is as though the mvcc should be scoped to a row only... Writes > against > > > > other rows should not hold up my read of my row. Tag an mvcc with a > > 'row' > > > > scope so we can see which on-going writes pertain to current > operation? > > > Thank you St.Ack! I think this approach would work. > > > > > > > You need to read back the increment and have it be 'correct' at > > increment > > > > time? > > > Yes, we need it. > > > > > > I would like to help if there is anything I can do. > > > > > > Thanks, > > > Toshihiro Suzuki > > > > > > > > > 2015-09-13 14:11 GMT+09:00 Stack : > > > > > > > Thank you for the below reasoning (with accompanying helpful > diagram). > > > > Makes sense. Let me hack up a test case to help with the > illustration. > > It > > > > is as though the mvcc should be scoped to a row only... Writes > against > > > > other rows should not hold up my read of my row. Tag an mvcc with a > > 'row' > > > > scope so we can see which on-going writes pertain to current > operation? > > > > > > > > You need to read back the increment and have it be 'correct' at > > increment > > > > time? > > > > > > > > (This is a good one) > > > > > > > > Thank you Toshihiro Suzuki > > > > St.Ack > > > > > > > > On Sat, Sep 12, 2015 at 8:09 AM, =E9=88=B4=E6=9C=A8=E4=BF=8A=E8=A3= =95 wrote: > > > > > > > > > St.Ack, > > > > > > > > > > Thank you for your response. > > > > > > > > > > Why I make out that "A region lock (not a row lock) seems to occu= r > in > > > > > waitForPreviousTransactionsComplete()" is as follows: > > > > > > > > > > A increment operation has 3 procedures for MVCC. > > > > > > > > > > 1. mvcc.waitForPreviousTransactionsComplete(); > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/= main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712 > > > > > > > > > > 2. w =3D mvcc.beginMemstoreInsertWithSeqNum(mvccNum); > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/= main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721 > > > > > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey); > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/= main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893 > > > > > > > > > > > > > > > I think that MultiVersionConsistencyControl's writeQueue can caus= e > a > > > > region > > > > > lock. > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/= main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyContr= ol.java#L42-L43 > > > > > > > > > > > > > > > Step 2 adds to a WriteEntry to writeQueue. > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/= main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyContr= ol.java#L102-L108 > > > > > > > > > > Step 3 removes the WriteEntry from writeQueue. > > > > > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) -> > > > > > waitForPreviousTransactionsComplete(e) -> advanceMemstore(w) > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/= main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyContr= ol.java#L127 > > > > > > > > > > > > > > > > > > > > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/= main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyContr= ol.java#L235 > > > > > > > > > > > > > > > > > > > > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/= main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyContr= ol.java#L160 > > > > > > > > > > Step 1 adds a WriteEntry w in beginMemstoreInsert() to writeQueue > and > > > > waits > > > > > until writeQueue is empty or writeQueue.getFirst() =3D=3D w. > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/= main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyContr= ol.java#L201-L204 > > > > > > > > > > > > > > > > > > > > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/= main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyContr= ol.java#L206-L241 > > > > > > > > > > > > > > > I think when a handler thread is processing between step 2 and st= ep > > 3, > > > > the > > > > > other handler threads can wait at step 1 until the thread complet= es > > > step > > > > 3 > > > > > This is depicted as follows: > > > > > > > > > > > > > > > > > > > > > > > > > https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6= aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png > > > > > > > > > > > > > > > Actually, in the thread dump of our region server, many handler > > threads > > > > > (RW.default.writeRpcServer.handler) wait at Step 1 > > > > > (waitForPreviousTransactionsComplete()). > > > > > > > > > > > > > > > > > > > > > > > > > https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6= aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt > > > > > > > > > > Many handler threads wait at this: > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/= main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyContr= ol.java#L224 > > > > > > > > > > > > > > > > Is it possible you are contending on a counter post-upgrade? I= s > it > > > > > > possible that all these threads are trying to get to the same r= ow > > to > > > > > update > > > > > > it? Could the app behavior have changed? Or are you thinking > > > increment > > > > > > itself has slowed significantly? > > > > > We have just upgraded HBase, not changed the app behavior. We are > > > > thinking > > > > > increment itself has slowed significantly. > > > > > Before upgrading HBase, it was good throughput and latency. > > > > > Currently, to cope with this problem, we split the regions finely= . > > > > > > > > > > Thanks, > > > > > > > > > > Toshihiro Suzuki > > > > > > > > > > > > > > > 2015-09-09 15:29 GMT+09:00 Stack : > > > > > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, =E9=88=B4=E6=9C=A8=E4=BF=8A=E8= =A3=95 > wrote: > > > > > > > > > > > > > Ted, > > > > > > > > > > > > > > Thank you for your response. > > > > > > > > > > > > > > I uploaded the complete stack trace to Gist. > > > > > > > > > > > > > > https://gist.github.com/brfrn169/cb4f2c157129330cd932 > > > > > > > > > > > > > > > > > > > > > I think that increment operation works as follows: > > > > > > > > > > > > > > 1. get row lock > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete() // wait for all > > prior > > > > > MVCC > > > > > > > transactions to finish > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() // start a transactio= n > > > > > > > 4. get previous values > > > > > > > 5. create KVs > > > > > > > 6. write to Memstore > > > > > > > 7. write to WAL > > > > > > > 8. release row lock > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() // complete the > > > > transaction > > > > > > > > > > > > > > A instance of MultiVersionConsistencyControl has a pending > queue > > of > > > > > > writes > > > > > > > named writeQueue. > > > > > > > Step 2 puts a WriteEntry w to writeQueue and waits until > > writeQueue > > > > is > > > > > > > empty or writeQueue.getFirst() =3D=3D w. > > > > > > > Step 3 puts a WriteEntry to writeQueue and step 9 removes the > > > > > WriteEntry > > > > > > > from writeQueue. > > > > > > > > > > > > > > I think that when a handler thread is processing between step= 2 > > and > > > > > step > > > > > > 9, > > > > > > > the other handler threads can wait until the thread completes > > step > > > 9. > > > > > > > > > > > > > > > > > > > > That is right. We need to read, after all outstanding updates a= re > > > > done... > > > > > > because we need to read the latest update before we go to > > > > > modify/increment > > > > > > it. > > > > > > > > > > > > How do you make out this? > > > > > > > > > > > > "A region lock (not a row lock) seems to occur in > > > > > > waitForPreviousTransactionsComplete()." > > > > > > > > > > > > In 0.98.x we did this: > > > > > > > > > > > > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert()); > > > > > > > > > > > > ... and in 1.0 we do this: > > > > > > > > > > > > mvcc.waitForPreviousTransactionsComplete() which is this.... > > > > > > > > > > > > + public void waitForPreviousTransactionsComplete() { > > > > > > + WriteEntry w =3D beginMemstoreInsert(); > > > > > > + waitForPreviousTransactionsComplete(w); > > > > > > + } > > > > > > > > > > > > The mvcc and region sequenceid were merged in 1.0 ( > > > > > > https://issues.apache.org/jira/browse/HBASE-8763). Previous mvc= c > > and > > > > > > region > > > > > > sequenceid would spin independent of each other. Perhaps this > > > > responsible > > > > > > for some slow down. > > > > > > > > > > > > That said, looking in your thread dump, we seem to be down in t= he > > > Get. > > > > If > > > > > > you do a bunch of thread dumps in a row, where is the > lock-holding > > > > > thread? > > > > > > In Get or writing Increment... or waiting on sequence id? > > > > > > > > > > > > Is it possible you are contending on a counter post-upgrade? I= s > it > > > > > > possible that all these threads are trying to get to the same r= ow > > to > > > > > update > > > > > > it? Could the app behavior have changed? Or are you thinking > > > increment > > > > > > itself has slowed significantly? > > > > > > > > > > > > St.Ack > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Toshihiro Suzuki > > > > > > > > > > > > > > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu : > > > > > > > > > > > > > > > In HRegion#increment(), we lock the row (not region): > > > > > > > > > > > > > > > > try { > > > > > > > > rowLock =3D getRowLock(row); > > > > > > > > > > > > > > > > Can you pastebin the complete stack trace ? > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, =E9=88=B4=E6=9C=A8=E4=BF=8A= =E8=A3=95 > > wrote: > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > We upgraded our cluster from CDH5.3.1(HBase0.98.6) to > > > > > > > > CDH5.4.5(HBase1.0.0) > > > > > > > > > and we experience slowdown in increment operation. > > > > > > > > > > > > > > > > > > Here's an extract from thread dump of the RegionServer of > our > > > > > > cluster: > > > > > > > > > > > > > > > > > > Thread 68 > > > > > (RW.default.writeRpcServer.handler=3D15,queue=3D5,port=3D60020): > > > > > > > > > State: BLOCKED > > > > > > > > > Blocked count: 21689888 > > > > > > > > > Waited count: 39828360 > > > > > > > > > Blocked on java.util.LinkedList@3474e4b2 > > > > > > > > > Blocked by 63 > > > > > > > (RW.default.writeRpcServer.handler=3D10,queue=3D0,port=3D6002= 0) > > > > > > > > > Stack: > > > > > > > > > java.lang.Object.wait(Native Method) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitF= orPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitF= orPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcService= s.java:501) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutat= ion(RSRpcServices.java:570) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.ja= va:1901) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.c= allBlockingMethod(ClientProtos.java:31451) > > > > > > > > > > > > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035) > > > > > > > > > > > > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130= ) > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) > > > > > > > > > java.lang.Thread.run(Thread.java:745) > > > > > > > > > > > > > > > > > > There are many similar threads in the thread dump. > > > > > > > > > > > > > > > > > > I read the source code and I think this is caused by > changes > > of > > > > > > > > > MultiVersionConsistencyControl. > > > > > > > > > A region lock (not a row lock) seems to occur in > > > > > > > > > waitForPreviousTransactionsComplete(). > > > > > > > > > > > > > > > > > > > > > > > > > > > Also we wrote performance test code for increment operati= on > > > that > > > > > > > included > > > > > > > > > 100 threads and ran it in local mode. > > > > > > > > > > > > > > > > > > The result is shown below: > > > > > > > > > > > > > > > > > > CDH5.3.1(HBase0.98.6) > > > > > > > > > Throughput(op/s): 12757, Latency(ms): 7.975072509210629 > > > > > > > > > > > > > > > > > > CDH5.4.5(HBase1.0.0) > > > > > > > > > Throughput(op/s): 2027, Latency(ms): 49.11840157868772 > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > Toshihiro Suzuki > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --001a113d74f6dd2d480525c63474--