Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 81A6718E13 for ; Tue, 22 Sep 2015 06:12:49 +0000 (UTC) Received: (qmail 84518 invoked by uid 500); 22 Sep 2015 06:12:46 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 84437 invoked by uid 500); 22 Sep 2015 06:12:45 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 84425 invoked by uid 99); 22 Sep 2015 06:12:45 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Sep 2015 06:12:45 +0000 Received: from mail-ob0-f174.google.com (mail-ob0-f174.google.com [209.85.214.174]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 6B0841A010F for ; Tue, 22 Sep 2015 06:12:45 +0000 (UTC) Received: by obbzf10 with SMTP id zf10so99604692obb.2 for ; Mon, 21 Sep 2015 23:12:44 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.182.50.195 with SMTP id e3mr14697679obo.86.1442902364750; Mon, 21 Sep 2015 23:12:44 -0700 (PDT) Received: by 10.202.108.19 with HTTP; Mon, 21 Sep 2015 23:12:44 -0700 (PDT) In-Reply-To: References: Date: Mon, 21 Sep 2015 23:12:44 -0700 Message-ID: Subject: Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0) From: Elliott Clark To: "dev@hbase.apache.org" Content-Type: multipart/alternative; boundary=089e0160b770e58ea905204fe298 --089e0160b770e58ea905204fe298 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Commented up on the jira. But I think there's a pretty easy solution that we can do that should be possible in the near future. We will continue to have issues in situations that are highly contended on just a small number of rows. But there's not a whole lot that I can see to make that situation much faster. On Mon, Sep 21, 2015 at 10:14 PM, Stack wrote: > Back to this problem. Simple tests confirm that as is, the > single-queue-backed MVCC instance can slow Region ops if some other row i= s > slow to complete. In particular Increment, checkAndPut, and batch mutatio= ns > are effected. I opened HBASE-14460 to start in on a fix up. Lets see if w= e > can somehow scope mvcc to row or at least shard mvcc so not all Region op= s > are paused. > > St.Ack > > > On Mon, Sep 14, 2015 at 4:15 AM, =E9=88=B4=E6=9C=A8=E4=BF=8A=E8=A3=95 wrote: > > > > Thank you for the below reasoning (with accompanying helpful diagram)= . > > > Makes sense. Let me hack up a test case to help with the illustration= . > It > > > is as though the mvcc should be scoped to a row only... Writes agains= t > > > other rows should not hold up my read of my row. Tag an mvcc with a > 'row' > > > scope so we can see which on-going writes pertain to current operatio= n? > > Thank you St.Ack! I think this approach would work. > > > > > You need to read back the increment and have it be 'correct' at > increment > > > time? > > Yes, we need it. > > > > I would like to help if there is anything I can do. > > > > Thanks, > > Toshihiro Suzuki > > > > > > 2015-09-13 14:11 GMT+09:00 Stack : > > > > > Thank you for the below reasoning (with accompanying helpful diagram)= . > > > Makes sense. Let me hack up a test case to help with the illustration= . > It > > > is as though the mvcc should be scoped to a row only... Writes agains= t > > > other rows should not hold up my read of my row. Tag an mvcc with a > 'row' > > > scope so we can see which on-going writes pertain to current operatio= n? > > > > > > You need to read back the increment and have it be 'correct' at > increment > > > time? > > > > > > (This is a good one) > > > > > > Thank you Toshihiro Suzuki > > > St.Ack > > > > > > On Sat, Sep 12, 2015 at 8:09 AM, =E9=88=B4=E6=9C=A8=E4=BF=8A=E8=A3=95= wrote: > > > > > > > St.Ack, > > > > > > > > Thank you for your response. > > > > > > > > Why I make out that "A region lock (not a row lock) seems to occur = in > > > > waitForPreviousTransactionsComplete()" is as follows: > > > > > > > > A increment operation has 3 procedures for MVCC. > > > > > > > > 1. mvcc.waitForPreviousTransactionsComplete(); > > > > > > > > > > > > > > > > > > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/= main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712 > > > > > > > > 2. w =3D mvcc.beginMemstoreInsertWithSeqNum(mvccNum); > > > > > > > > > > > > > > > > > > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/= main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721 > > > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey); > > > > > > > > > > > > > > > > > > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/= main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893 > > > > > > > > > > > > I think that MultiVersionConsistencyControl's writeQueue can cause = a > > > region > > > > lock. > > > > > > > > > > > > > > > > > > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/= main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyContr= ol.java#L42-L43 > > > > > > > > > > > > Step 2 adds to a WriteEntry to writeQueue. > > > > > > > > > > > > > > > > > > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/= main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyContr= ol.java#L102-L108 > > > > > > > > Step 3 removes the WriteEntry from writeQueue. > > > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) -> > > > > waitForPreviousTransactionsComplete(e) -> advanceMemstore(w) > > > > > > > > > > > > > > > > > > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/= main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyContr= ol.java#L127 > > > > > > > > > > > > > > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/= main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyContr= ol.java#L235 > > > > > > > > > > > > > > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/= main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyContr= ol.java#L160 > > > > > > > > Step 1 adds a WriteEntry w in beginMemstoreInsert() to writeQueue a= nd > > > waits > > > > until writeQueue is empty or writeQueue.getFirst() =3D=3D w. > > > > > > > > > > > > > > > > > > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/= main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyContr= ol.java#L201-L204 > > > > > > > > > > > > > > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/= main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyContr= ol.java#L206-L241 > > > > > > > > > > > > I think when a handler thread is processing between step 2 and step > 3, > > > the > > > > other handler threads can wait at step 1 until the thread completes > > step > > > 3 > > > > This is depicted as follows: > > > > > > > > > > > > > > > > > > https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6= aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png > > > > > > > > > > > > Actually, in the thread dump of our region server, many handler > threads > > > > (RW.default.writeRpcServer.handler) wait at Step 1 > > > > (waitForPreviousTransactionsComplete()). > > > > > > > > > > > > > > > > > > https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6= aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt > > > > > > > > Many handler threads wait at this: > > > > > > > > > > > > > > > > > > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/= main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyContr= ol.java#L224 > > > > > > > > > > > > > Is it possible you are contending on a counter post-upgrade? Is = it > > > > > possible that all these threads are trying to get to the same row > to > > > > update > > > > > it? Could the app behavior have changed? Or are you thinking > > increment > > > > > itself has slowed significantly? > > > > We have just upgraded HBase, not changed the app behavior. We are > > > thinking > > > > increment itself has slowed significantly. > > > > Before upgrading HBase, it was good throughput and latency. > > > > Currently, to cope with this problem, we split the regions finely. > > > > > > > > Thanks, > > > > > > > > Toshihiro Suzuki > > > > > > > > > > > > 2015-09-09 15:29 GMT+09:00 Stack : > > > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, =E9=88=B4=E6=9C=A8=E4=BF=8A=E8= =A3=95 wrote: > > > > > > > > > > > Ted, > > > > > > > > > > > > Thank you for your response. > > > > > > > > > > > > I uploaded the complete stack trace to Gist. > > > > > > > > > > > > https://gist.github.com/brfrn169/cb4f2c157129330cd932 > > > > > > > > > > > > > > > > > > I think that increment operation works as follows: > > > > > > > > > > > > 1. get row lock > > > > > > 2. mvcc.waitForPreviousTransactionsComplete() // wait for all > prior > > > > MVCC > > > > > > transactions to finish > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() // start a transaction > > > > > > 4. get previous values > > > > > > 5. create KVs > > > > > > 6. write to Memstore > > > > > > 7. write to WAL > > > > > > 8. release row lock > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() // complete the > > > transaction > > > > > > > > > > > > A instance of MultiVersionConsistencyControl has a pending queu= e > of > > > > > writes > > > > > > named writeQueue. > > > > > > Step 2 puts a WriteEntry w to writeQueue and waits until > writeQueue > > > is > > > > > > empty or writeQueue.getFirst() =3D=3D w. > > > > > > Step 3 puts a WriteEntry to writeQueue and step 9 removes the > > > > WriteEntry > > > > > > from writeQueue. > > > > > > > > > > > > I think that when a handler thread is processing between step 2 > and > > > > step > > > > > 9, > > > > > > the other handler threads can wait until the thread completes > step > > 9. > > > > > > > > > > > > > > > > > That is right. We need to read, after all outstanding updates are > > > done... > > > > > because we need to read the latest update before we go to > > > > modify/increment > > > > > it. > > > > > > > > > > How do you make out this? > > > > > > > > > > "A region lock (not a row lock) seems to occur in > > > > > waitForPreviousTransactionsComplete()." > > > > > > > > > > In 0.98.x we did this: > > > > > > > > > > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert()); > > > > > > > > > > ... and in 1.0 we do this: > > > > > > > > > > mvcc.waitForPreviousTransactionsComplete() which is this.... > > > > > > > > > > + public void waitForPreviousTransactionsComplete() { > > > > > + WriteEntry w =3D beginMemstoreInsert(); > > > > > + waitForPreviousTransactionsComplete(w); > > > > > + } > > > > > > > > > > The mvcc and region sequenceid were merged in 1.0 ( > > > > > https://issues.apache.org/jira/browse/HBASE-8763). Previous mvcc > and > > > > > region > > > > > sequenceid would spin independent of each other. Perhaps this > > > responsible > > > > > for some slow down. > > > > > > > > > > That said, looking in your thread dump, we seem to be down in the > > Get. > > > If > > > > > you do a bunch of thread dumps in a row, where is the lock-holdin= g > > > > thread? > > > > > In Get or writing Increment... or waiting on sequence id? > > > > > > > > > > Is it possible you are contending on a counter post-upgrade? Is = it > > > > > possible that all these threads are trying to get to the same row > to > > > > update > > > > > it? Could the app behavior have changed? Or are you thinking > > increment > > > > > itself has slowed significantly? > > > > > > > > > > St.Ack > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Toshihiro Suzuki > > > > > > > > > > > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu : > > > > > > > > > > > > > In HRegion#increment(), we lock the row (not region): > > > > > > > > > > > > > > try { > > > > > > > rowLock =3D getRowLock(row); > > > > > > > > > > > > > > Can you pastebin the complete stack trace ? > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, =E9=88=B4=E6=9C=A8=E4=BF=8A= =E8=A3=95 > wrote: > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > We upgraded our cluster from CDH5.3.1(HBase0.98.6) to > > > > > > > CDH5.4.5(HBase1.0.0) > > > > > > > > and we experience slowdown in increment operation. > > > > > > > > > > > > > > > > Here's an extract from thread dump of the RegionServer of o= ur > > > > > cluster: > > > > > > > > > > > > > > > > Thread 68 > > > > (RW.default.writeRpcServer.handler=3D15,queue=3D5,port=3D60020): > > > > > > > > State: BLOCKED > > > > > > > > Blocked count: 21689888 > > > > > > > > Waited count: 39828360 > > > > > > > > Blocked on java.util.LinkedList@3474e4b2 > > > > > > > > Blocked by 63 > > > > > > (RW.default.writeRpcServer.handler=3D10,queue=3D0,port=3D60020) > > > > > > > > Stack: > > > > > > > > java.lang.Object.wait(Native Method) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitF= orPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitF= orPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203) > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:671= 2) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcService= s.java:501) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutat= ion(RSRpcServices.java:570) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.ja= va:1901) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.c= allBlockingMethod(ClientProtos.java:31451) > > > > > > > > > > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035) > > > > > > > > > > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130= ) > > > > > > > > > > > > > > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) > > > > > > > > java.lang.Thread.run(Thread.java:745) > > > > > > > > > > > > > > > > There are many similar threads in the thread dump. > > > > > > > > > > > > > > > > I read the source code and I think this is caused by change= s > of > > > > > > > > MultiVersionConsistencyControl. > > > > > > > > A region lock (not a row lock) seems to occur in > > > > > > > > waitForPreviousTransactionsComplete(). > > > > > > > > > > > > > > > > > > > > > > > > Also we wrote performance test code for increment operation > > that > > > > > > included > > > > > > > > 100 threads and ran it in local mode. > > > > > > > > > > > > > > > > The result is shown below: > > > > > > > > > > > > > > > > CDH5.3.1(HBase0.98.6) > > > > > > > > Throughput(op/s): 12757, Latency(ms): 7.975072509210629 > > > > > > > > > > > > > > > > CDH5.4.5(HBase1.0.0) > > > > > > > > Throughput(op/s): 2027, Latency(ms): 49.11840157868772 > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > Toshihiro Suzuki > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --089e0160b770e58ea905204fe298--