hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)
Date Tue, 22 Sep 2015 05:14:28 GMT
Back to this problem. Simple tests confirm that as is, the
single-queue-backed MVCC instance can slow Region ops if some other row is
slow to complete. In particular Increment, checkAndPut, and batch mutations
are effected. I opened HBASE-14460 to start in on a fix up. Lets see if we
can somehow scope mvcc to row or at least shard mvcc so not all Region ops
are paused.

St.Ack


On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <brfrn169@gmail.com> wrote:

> > Thank you for the below reasoning (with accompanying helpful diagram).
> > Makes sense. Let me hack up a test case to help with the illustration. It
> > is as though the mvcc should be scoped to a row only... Writes against
> > other rows should not hold up my read of my row. Tag an mvcc with a 'row'
> > scope so we can see which on-going writes pertain to current operation?
> Thank you St.Ack! I think this approach would work.
>
> > You need to read back the increment and have it be 'correct' at increment
> > time?
> Yes, we need it.
>
> I would like to help if there is anything I can do.
>
> Thanks,
> Toshihiro Suzuki
>
>
> 2015-09-13 14:11 GMT+09:00 Stack <stack@duboce.net>:
>
> > Thank you for the below reasoning (with accompanying helpful diagram).
> > Makes sense. Let me hack up a test case to help with the illustration. It
> > is as though the mvcc should be scoped to a row only... Writes against
> > other rows should not hold up my read of my row. Tag an mvcc with a 'row'
> > scope so we can see which on-going writes pertain to current operation?
> >
> > You need to read back the increment and have it be 'correct' at increment
> > time?
> >
> > (This is a good one)
> >
> > Thank you Toshihiro Suzuki
> > St.Ack
> >
> > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <brfrn169@gmail.com> wrote:
> >
> > > St.Ack,
> > >
> > > Thank you for your response.
> > >
> > > Why I make out that "A region lock (not a row lock) seems to occur in
> > > waitForPreviousTransactionsComplete()" is as follows:
> > >
> > > A increment operation has 3 procedures for MVCC.
> > >
> > > 1. mvcc.waitForPreviousTransactionsComplete();
> > >
> > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> > >
> > > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> > >
> > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> > >
> > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey);
> > >
> > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> > >
> > >
> > > I think that MultiVersionConsistencyControl's writeQueue can cause a
> > region
> > > lock.
> > >
> > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> > >
> > >
> > > Step 2 adds to a WriteEntry to writeQueue.
> > >
> > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> > >
> > > Step 3 removes the WriteEntry from writeQueue.
> > >
> > > completeMemstoreInsertWithSeqNum(w, walKey) ->
> > > waitForPreviousTransactionsComplete(e) -> advanceMemstore(w)
> > >
> > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> > >
> > > Step 1 adds a WriteEntry w in beginMemstoreInsert() to writeQueue and
> > waits
> > > until writeQueue is empty or writeQueue.getFirst() == w.
> > >
> > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> > >
> > >
> > > I think when a handler thread is processing between step 2 and step 3,
> > the
> > > other handler threads can wait at step 1 until the thread completes
> step
> > 3
> > > This is depicted as follows:
> > >
> > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> > >
> > >
> > > Actually, in the thread dump of our region server, many handler threads
> > > (RW.default.writeRpcServer.handler) wait at Step 1
> > > (waitForPreviousTransactionsComplete()).
> > >
> > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> > >
> > > Many handler threads wait at this:
> > >
> > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> > >
> > >
> > > > Is it possible you are contending on a counter post-upgrade?  Is it
> > > > possible that all these threads are trying to get to the same row to
> > > update
> > > > it? Could the app behavior have changed?  Or are you thinking
> increment
> > > > itself has slowed significantly?
> > > We have just upgraded HBase, not changed the app behavior. We are
> > thinking
> > > increment itself has slowed significantly.
> > > Before upgrading HBase, it was good throughput and latency.
> > > Currently, to cope with this problem, we split the regions finely.
> > >
> > > Thanks,
> > >
> > > Toshihiro Suzuki
> > >
> > >
> > > 2015-09-09 15:29 GMT+09:00 Stack <stack@duboce.net>:
> > >
> > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <brfrn169@gmail.com>
wrote:
> > > >
> > > > > Ted,
> > > > >
> > > > > Thank you for your response.
> > > > >
> > > > > I uploaded the complete stack trace to Gist.
> > > > >
> > > > > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > > > >
> > > > >
> > > > > I think that increment operation works as follows:
> > > > >
> > > > > 1. get row lock
> > > > > 2. mvcc.waitForPreviousTransactionsComplete() // wait for all prior
> > > MVCC
> > > > > transactions to finish
> > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() // start a transaction
> > > > > 4. get previous values
> > > > > 5. create KVs
> > > > > 6. write to Memstore
> > > > > 7. write to WAL
> > > > > 8. release row lock
> > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() // complete the
> > transaction
> > > > >
> > > > > A instance of MultiVersionConsistencyControl has a pending queue
of
> > > > writes
> > > > > named writeQueue.
> > > > > Step 2 puts a WriteEntry w to writeQueue and waits until writeQueue
> > is
> > > > > empty or writeQueue.getFirst() == w.
> > > > > Step 3 puts a WriteEntry to writeQueue and step 9 removes the
> > > WriteEntry
> > > > > from writeQueue.
> > > > >
> > > > > I think that when a handler thread is processing between step 2 and
> > > step
> > > > 9,
> > > > > the other handler threads can wait until the thread completes step
> 9.
> > > > >
> > > > >
> > > > That is right. We need to read, after all outstanding updates are
> > done...
> > > > because we need to read the latest update before we go to
> > > modify/increment
> > > > it.
> > > >
> > > > How do you make out this?
> > > >
> > > > "A region lock (not a row lock) seems to occur in
> > > > waitForPreviousTransactionsComplete()."
> > > >
> > > > In 0.98.x we did this:
> > > >
> > > > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > > >
> > > > ... and in 1.0 we do this:
> > > >
> > > > mvcc.waitForPreviousTransactionsComplete() which is this....
> > > >
> > > > +  public void waitForPreviousTransactionsComplete() {
> > > > +    WriteEntry w = beginMemstoreInsert();
> > > > +    waitForPreviousTransactionsComplete(w);
> > > > +  }
> > > >
> > > > The mvcc and region sequenceid were merged in 1.0 (
> > > > https://issues.apache.org/jira/browse/HBASE-8763). Previous mvcc and
> > > > region
> > > > sequenceid would spin independent of each other. Perhaps this
> > responsible
> > > > for some slow down.
> > > >
> > > > That said, looking in your thread dump, we seem to be down in the
> Get.
> > If
> > > > you do a bunch of thread dumps in a row, where is the lock-holding
> > > thread?
> > > > In Get or writing Increment... or waiting on sequence id?
> > > >
> > > > Is it possible you are contending on a counter post-upgrade?  Is it
> > > > possible that all these threads are trying to get to the same row to
> > > update
> > > > it? Could the app behavior have changed?  Or are you thinking
> increment
> > > > itself has slowed significantly?
> > > >
> > > > St.Ack
> > > >
> > > >
> > > >
> > > >
> > > > > Thanks,
> > > > >
> > > > > Toshihiro Suzuki
> > > > >
> > > > >
> > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <yuzhihong@gmail.com>:
> > > > >
> > > > > > In HRegion#increment(), we lock the row (not region):
> > > > > >
> > > > > >     try {
> > > > > >       rowLock = getRowLock(row);
> > > > > >
> > > > > > Can you pastebin the complete stack trace ?
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <brfrn169@gmail.com>
wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > We upgraded our cluster from CDH5.3.1(HBase0.98.6) to
> > > > > > CDH5.4.5(HBase1.0.0)
> > > > > > > and we experience slowdown in increment operation.
> > > > > > >
> > > > > > > Here's an extract from thread dump of the RegionServer
of our
> > > > cluster:
> > > > > > >
> > > > > > > Thread 68
> > > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > > > > > >   State: BLOCKED
> > > > > > >   Blocked count: 21689888
> > > > > > >   Waited count: 39828360
> > > > > > >   Blocked on java.util.LinkedList@3474e4b2
> > > > > > >   Blocked by 63
> > > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > > > > > >   Stack:
> > > > > > >     java.lang.Object.wait(Native Method)
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > > > > > >
> > > > > > >
> > > > >
> > >
> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > > > > > >
> >  org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > > > > > >
> >  org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > > > > >
> > > >  org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > > > > >     java.lang.Thread.run(Thread.java:745)
> > > > > > >
> > > > > > > There are many similar threads in the thread dump.
> > > > > > >
> > > > > > > I read the source code and I think this is caused by changes
of
> > > > > > > MultiVersionConsistencyControl.
> > > > > > > A region lock (not a row lock) seems to occur in
> > > > > > > waitForPreviousTransactionsComplete().
> > > > > > >
> > > > > > >
> > > > > > > Also we wrote performance test code for increment operation
> that
> > > > > included
> > > > > > > 100 threads and ran it in local mode.
> > > > > > >
> > > > > > > The result is shown below:
> > > > > > >
> > > > > > > CDH5.3.1(HBase0.98.6)
> > > > > > > Throughput(op/s): 12757, Latency(ms): 7.975072509210629
> > > > > > >
> > > > > > > CDH5.4.5(HBase1.0.0)
> > > > > > > Throughput(op/s): 2027, Latency(ms): 49.11840157868772
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Toshihiro Suzuki
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message