hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Beaudreault <bbeaudrea...@hubspot.com>
Subject Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)
Date Tue, 01 Dec 2015 05:16:32 GMT
I'll try to get another one.  We are currently not seeing the issue due to
lack of contention (it is off hours for our customers).

Note that the stack trace I gave you was taken with a tool we have which
aggregates common stacks. The one at the bottom occurred 122 times (out of
128 handlers -- this is pre-tuning when we added 1000 handlers and the read
vs write).  So to me it looks like 122 of 128 handlers were waiting on:

if (!existingContext.latch.await(this.rowLockWaitDuration,
TimeUnit.MILLISECONDS)) {
  throw new IOException("Timed out waiting for lock for row: " + rowKey);
}


On Tue, Dec 1, 2015 at 12:08 AM Stack <stack@duboce.net> wrote:

> Looking at that stack trace, nothing showing as blocked or slowed by
> another operation. You have others I could look at Bryan?
> St.Ack
>
> On Mon, Nov 30, 2015 at 8:40 PM, Bryan Beaudreault <
> bbeaudreault@hubspot.com
> > wrote:
>
> > Yea sorry if I was misleading.  The nonce loglines we saw only happened
> on
> > full cluster restart, it may have been the HLog's replaying, not sure.
> >
> > We are still seeing slow Increments. Where Gets and Mutates will be on
> the
> > order of 50-150ms according to metrics, Increment will be in the
> > 1000-5000ms range. It seems we may be blocking on FSHLog#syncer.
> >
> >
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L359
> >
> >
> >
> > On Mon, Nov 30, 2015 at 11:26 PM Stack <stack@duboce.net> wrote:
> >
> > > Still slow increments though?
> > >
> > > On Mon, Nov 30, 2015 at 5:05 PM, Bryan Beaudreault <
> > > bbeaudreault@hubspot.com
> > > > wrote:
> > >
> > > > Those log lines have settled down, they may have been related to a
> > > > cluster-wide forced restart at the time.
> > > >
> > > > On Mon, Nov 30, 2015 at 7:59 PM Bryan Beaudreault <
> > > > bbeaudreault@hubspot.com>
> > > > wrote:
> > > >
> > > > > We've been doing more debugging of this and have set up the read vs
> > > write
> > > > > handlers to try to at least segment this away so reads can work. We
> > > have
> > > > > pretty beefy servers, and are running wiht the following settings:
> > > > >
> > > > > hbase.regionserver.handler.count=1000
> > > > > hbase.ipc.server.read.threadpool.size=50
> > > > > hbase.ipc.server.callqueue.handler.factor=0.025
> > > > > hbase.ipc.server.callqueue.read.ratio=0.6
> > > > > hbase.ipc.server.callqueue.scan.ratio=0.5
> > > > >
> > > > > We are seeing all 400 write handlers taken up by row locks for the
> > most
> > > > > part. The read handlers are mostly idle. We're thinking of changing
> > the
> > > > > ratio here, but are not sure it will help if they are all blocked
> on
> > a
> > > > row
> > > > > lock.  We just enabled DEBUG logging on all our servers and notice
> > the
> > > > > following:
> > > > >
> > > > > 2015-12-01 00:56:09,015 DEBUG
> > > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > > detected
> > > > > by nonce: [-687451119961178644:7664336281906118656], [state 0,
> > hasWait
> > > > > false, activity 00:54:36.240]
> > > > > 2015-12-01 00:56:09,015 DEBUG
> > > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > > detected
> > > > > by nonce: [-687451119961178644:-7119840249342174227], [state 0,
> > hasWait
> > > > > false, activity 00:54:36.256]
> > > > > 2015-12-01 00:56:09,268 DEBUG
> > > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > > detected
> > > > > by nonce: [-5946137511131403479:2112661701888365489], [state 0,
> > hasWait
> > > > > false, activity 00:55:01.259]
> > > > > 2015-12-01 00:56:09,279 DEBUG
> > > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > > detected
> > > > > by nonce: [4165332617675853029:6256955295384472057], [state 0,
> > hasWait
> > > > > false, activity 00:53:58.151]
> > > > > 2015-12-01 00:56:09,279 DEBUG
> > > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > > detected
> > > > > by nonce: [4165332617675853029:4961178013070912522], [state 0,
> > hasWait
> > > > > false, activity 00:53:58.162]
> > > > >
> > > > >
> > > > > On Mon, Nov 30, 2015 at 6:11 PM Bryan Beaudreault <
> > > > > bbeaudreault@hubspot.com> wrote:
> > > > >
> > > > >> Sorry the second link should be
> > > > >>
> > > >
> > >
> >
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L579
> > > > >>
> > > > >> On Mon, Nov 30, 2015 at 6:10 PM Bryan Beaudreault <
> > > > >> bbeaudreault@hubspot.com> wrote:
> > > > >>
> > > > >>> https://gist.github.com/bbeaudreault/2994a748da83d9f75085
> > > > >>>
> > > > >>> An active handler:
> > > > >>>
> > > >
> > >
> >
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286
> > > > >>> One that is locked:
> > > > >>>
> > > >
> > >
> >
> https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfile1-txt-L579
> > > > >>>
> > > > >>> The difference between pre-rollback and post is that previously
> we
> > > were
> > > > >>> seeing things blocked in mvcc.  Now we are seeing them blocked on
> > the
> > > > >>> upsert.
> > > > >>>
> > > > >>> It always follows the same pattern, of 1 active handler in the
> > upsert
> > > > >>> and the rest blocked waiting for it.
> > > > >>>
> > > > >>> On Mon, Nov 30, 2015 at 6:05 PM Stack <stack@duboce.net> wrote:
> > > > >>>
> > > > >>>> On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault <
> > > > >>>> bbeaudreault@hubspot.com
> > > > >>>> > wrote:
> > > > >>>>
> > > > >>>> > The rollback seems to have mostly solved the issue for one of
> > our
> > > > >>>> clusters,
> > > > >>>> > but another one is still seeing long increment times:
> > > > >>>> >
> > > > >>>> > "slowIncrementCount": 52080,
> > > > >>>> > "Increment_num_ops": 325236,"Increment_min":
> 1,"Increment_max":
> > > > 6162,"
> > > > >>>> > Increment_mean": 465.68678129112396,"Increment_median": 216,"
> > > > >>>> > Increment_75th_percentile":
> 450.25,"Increment_95th_percentile":
> > > > >>>> > 1052.6499999999999,"Increment_99th_percentile":
> > 1635.2399999999998
> > > > >>>> >
> > > > >>>> >
> > > > >>>> > Any ideas if there are other changes that may be causing a
> > > > performance
> > > > >>>> > regression for increments between CDH4.7.1 and CDH5.3.8?
> > > > >>>> >
> > > > >>>> >
> > > > >>>> >
> > > > >>>> No.
> > > > >>>>
> > > > >>>> Post a thread dump Bryan and it might prompt something.
> > > > >>>>
> > > > >>>> St.Ack
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> >
> > > > >>>> > On Mon, Nov 30, 2015 at 4:13 PM Stack <stack@duboce.net>
> wrote:
> > > > >>>> >
> > > > >>>> > > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
> > > > >>>> > > bbeaudreault@hubspot.com> wrote:
> > > > >>>> > >
> > > > >>>> > > > Should this be added as a known issue in the CDH or hbase
> > > > >>>> > documentation?
> > > > >>>> > > It
> > > > >>>> > > > was a severe performance hit for us, all of our
> > regionservers
> > > > were
> > > > >>>> > > sitting
> > > > >>>> > > > at a few thousand queued requests.
> > > > >>>> > > >
> > > > >>>> > > >
> > > > >>>> > > Let me take care of that.
> > > > >>>> > > St.Ack
> > > > >>>> > >
> > > > >>>> > >
> > > > >>>> > >
> > > > >>>> > > > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
> > > > >>>> > > > bbeaudreault@hubspot.com>
> > > > >>>> > > > wrote:
> > > > >>>> > > >
> > > > >>>> > > > > Yea, they are all over the place and called from client
> > and
> > > > >>>> > coprocessor
> > > > >>>> > > > > code. We ended up having no other option but to
> rollback,
> > > and
> > > > >>>> aside
> > > > >>>> > > from
> > > > >>>> > > > a
> > > > >>>> > > > > few NoSuchMethodErrors due to API changes (Put#add vs
> > > > >>>> Put#addColumn),
> > > > >>>> > > it
> > > > >>>> > > > > seems to be working and fixing our problem.
> > > > >>>> > > > >
> > > > >>>> > > > > On Mon, Nov 30, 2015 at 3:47 PM Stack <stack@duboce.net
> >
> > > > wrote:
> > > > >>>> > > > >
> > > > >>>> > > > >> Rollback is untested. No fix in 5.5. I was going to
> work
> > on
> > > > >>>> this
> > > > >>>> > now.
> > > > >>>> > > > >> Where
> > > > >>>> > > > >> are your counters Bryan? In their own column family or
> > > > >>>> scattered
> > > > >>>> > about
> > > > >>>> > > > in
> > > > >>>> > > > >> a
> > > > >>>> > > > >> row with other Cell types?
> > > > >>>> > > > >> St.Ack
> > > > >>>> > > > >>
> > > > >>>> > > > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
> > > > >>>> > > > >> bbeaudreault@hubspot.com> wrote:
> > > > >>>> > > > >>
> > > > >>>> > > > >> > Is there any update to this? We just upgraded all of
> > our
> > > > >>>> > production
> > > > >>>> > > > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this
> > JIRA
> > > > >>>> listed in
> > > > >>>> > > the
> > > > >>>> > > > >> > known issues, did not not about this.  Now we are
> > seeing
> > > > >>>> > perfomance
> > > > >>>> > > > >> issues
> > > > >>>> > > > >> > across all clusters, as we make heavy use of
> > increments.
> > > > >>>> > > > >> >
> > > > >>>> > > > >> > Can we roll forward to CDH5.5 to fix? Or is our only
> > hope
> > > > to
> > > > >>>> roll
> > > > >>>> > > back
> > > > >>>> > > > >> to
> > > > >>>> > > > >> > CDH 5.3.1 (if that is possible)?
> > > > >>>> > > > >> >
> > > > >>>> > > > >> >
> > > > >>>> > > > >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <
> > brfrn169@gmail.com
> > > >
> > > > >>>> wrote:
> > > > >>>> > > > >> >
> > > > >>>> > > > >> > > Thank you St.Ack!
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> > > I would like to follow the ticket.
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> > > Toshihiro Suzuki
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> > > 2015-09-22 14:14 GMT+09:00 Stack <stack@duboce.net
> >:
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> > > > Back to this problem. Simple tests confirm that
> as
> > > is,
> > > > >>>> the
> > > > >>>> > > > >> > > > single-queue-backed MVCC instance can slow Region
> > ops
> > > > if
> > > > >>>> some
> > > > >>>> > > > other
> > > > >>>> > > > >> row
> > > > >>>> > > > >> > > is
> > > > >>>> > > > >> > > > slow to complete. In particular Increment,
> > > checkAndPut,
> > > > >>>> and
> > > > >>>> > > batch
> > > > >>>> > > > >> > > mutations
> > > > >>>> > > > >> > > > are effected. I opened HBASE-14460 to start in
> on a
> > > fix
> > > > >>>> up.
> > > > >>>> > Lets
> > > > >>>> > > > >> see if
> > > > >>>> > > > >> > > we
> > > > >>>> > > > >> > > > can somehow scope mvcc to row or at least shard
> > mvcc
> > > so
> > > > >>>> not
> > > > >>>> > all
> > > > >>>> > > > >> Region
> > > > >>>> > > > >> > > ops
> > > > >>>> > > > >> > > > are paused.
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > > > St.Ack
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <
> > > > >>>> brfrn169@gmail.com>
> > > > >>>> > > wrote:
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > > > > > Thank you for the below reasoning (with
> > > > accompanying
> > > > >>>> > helpful
> > > > >>>> > > > >> > > diagram).
> > > > >>>> > > > >> > > > > > Makes sense. Let me hack up a test case to
> help
> > > > with
> > > > >>>> the
> > > > >>>> > > > >> > > illustration.
> > > > >>>> > > > >> > > > It
> > > > >>>> > > > >> > > > > > is as though the mvcc should be scoped to a
> row
> > > > >>>> only...
> > > > >>>> > > Writes
> > > > >>>> > > > >> > > against
> > > > >>>> > > > >> > > > > > other rows should not hold up my read of my
> > row.
> > > > Tag
> > > > >>>> an
> > > > >>>> > mvcc
> > > > >>>> > > > >> with a
> > > > >>>> > > > >> > > > 'row'
> > > > >>>> > > > >> > > > > > scope so we can see which on-going writes
> > pertain
> > > > to
> > > > >>>> > current
> > > > >>>> > > > >> > > operation?
> > > > >>>> > > > >> > > > > Thank you St.Ack! I think this approach would
> > work.
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > > > > You need to read back the increment and have
> it
> > > be
> > > > >>>> > 'correct'
> > > > >>>> > > > at
> > > > >>>> > > > >> > > > increment
> > > > >>>> > > > >> > > > > > time?
> > > > >>>> > > > >> > > > > Yes, we need it.
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > > > I would like to help if there is anything I can
> > do.
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > > > Thanks,
> > > > >>>> > > > >> > > > > Toshihiro Suzuki
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <
> > stack@duboce.net
> > > >:
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > > > > Thank you for the below reasoning (with
> > > > accompanying
> > > > >>>> > helpful
> > > > >>>> > > > >> > > diagram).
> > > > >>>> > > > >> > > > > > Makes sense. Let me hack up a test case to
> help
> > > > with
> > > > >>>> the
> > > > >>>> > > > >> > > illustration.
> > > > >>>> > > > >> > > > It
> > > > >>>> > > > >> > > > > > is as though the mvcc should be scoped to a
> row
> > > > >>>> only...
> > > > >>>> > > Writes
> > > > >>>> > > > >> > > against
> > > > >>>> > > > >> > > > > > other rows should not hold up my read of my
> > row.
> > > > Tag
> > > > >>>> an
> > > > >>>> > mvcc
> > > > >>>> > > > >> with a
> > > > >>>> > > > >> > > > 'row'
> > > > >>>> > > > >> > > > > > scope so we can see which on-going writes
> > pertain
> > > > to
> > > > >>>> > current
> > > > >>>> > > > >> > > operation?
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > > > You need to read back the increment and have
> it
> > > be
> > > > >>>> > 'correct'
> > > > >>>> > > > at
> > > > >>>> > > > >> > > > increment
> > > > >>>> > > > >> > > > > > time?
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > > > (This is a good one)
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > > > Thank you Toshihiro Suzuki
> > > > >>>> > > > >> > > > > > St.Ack
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <
> > > > >>>> brfrn169@gmail.com
> > > > >>>> > >
> > > > >>>> > > > >> wrote:
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > > > > St.Ack,
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Thank you for your response.
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Why I make out that "A region lock (not a
> row
> > > > lock)
> > > > >>>> > seems
> > > > >>>> > > to
> > > > >>>> > > > >> > occur
> > > > >>>> > > > >> > > in
> > > > >>>> > > > >> > > > > > > waitForPreviousTransactionsComplete()" is
> as
> > > > >>>> follows:
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > A increment operation has 3 procedures for
> > > MVCC.
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > 1.
> > mvcc.waitForPreviousTransactionsComplete();
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > 2. w =
> > > > mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w,
> > > > >>>> walKey);
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > I think that
> MultiVersionConsistencyControl's
> > > > >>>> writeQueue
> > > > >>>> > > can
> > > > >>>> > > > >> > cause
> > > > >>>> > > > >> > > a
> > > > >>>> > > > >> > > > > > region
> > > > >>>> > > > >> > > > > > > lock.
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Step 3 removes the WriteEntry from
> > writeQueue.
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey)
> > ->
> > > > >>>> > > > >> > > > > > > waitForPreviousTransactionsComplete(e) ->
> > > > >>>> > > advanceMemstore(w)
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Step 1 adds a WriteEntry w in
> > > > >>>> beginMemstoreInsert() to
> > > > >>>> > > > >> writeQueue
> > > > >>>> > > > >> > > and
> > > > >>>> > > > >> > > > > > waits
> > > > >>>> > > > >> > > > > > > until writeQueue is empty or
> > > > writeQueue.getFirst()
> > > > >>>> == w.
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > I think when a handler thread is processing
> > > > between
> > > > >>>> > step 2
> > > > >>>> > > > and
> > > > >>>> > > > >> > step
> > > > >>>> > > > >> > > > 3,
> > > > >>>> > > > >> > > > > > the
> > > > >>>> > > > >> > > > > > > other handler threads can wait at step 1
> > until
> > > > the
> > > > >>>> > thread
> > > > >>>> > > > >> > completes
> > > > >>>> > > > >> > > > > step
> > > > >>>> > > > >> > > > > > 3
> > > > >>>> > > > >> > > > > > > This is depicted as follows:
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Actually, in the thread dump of our region
> > > > server,
> > > > >>>> many
> > > > >>>> > > > >> handler
> > > > >>>> > > > >> > > > threads
> > > > >>>> > > > >> > > > > > > (RW.default.writeRpcServer.handler) wait at
> > > Step
> > > > 1
> > > > >>>> > > > >> > > > > > > (waitForPreviousTransactionsComplete()).
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Many handler threads wait at this:
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > > Is it possible you are contending on a
> > > counter
> > > > >>>> > > > post-upgrade?
> > > > >>>> > > > >> > Is
> > > > >>>> > > > >> > > it
> > > > >>>> > > > >> > > > > > > > possible that all these threads are
> trying
> > to
> > > > >>>> get to
> > > > >>>> > the
> > > > >>>> > > > >> same
> > > > >>>> > > > >> > row
> > > > >>>> > > > >> > > > to
> > > > >>>> > > > >> > > > > > > update
> > > > >>>> > > > >> > > > > > > > it? Could the app behavior have changed?
> > Or
> > > > are
> > > > >>>> you
> > > > >>>> > > > >> thinking
> > > > >>>> > > > >> > > > > increment
> > > > >>>> > > > >> > > > > > > > itself has slowed significantly?
> > > > >>>> > > > >> > > > > > > We have just upgraded HBase, not changed
> the
> > > app
> > > > >>>> > behavior.
> > > > >>>> > > > We
> > > > >>>> > > > >> are
> > > > >>>> > > > >> > > > > > thinking
> > > > >>>> > > > >> > > > > > > increment itself has slowed significantly.
> > > > >>>> > > > >> > > > > > > Before upgrading HBase, it was good
> > throughput
> > > > and
> > > > >>>> > > latency.
> > > > >>>> > > > >> > > > > > > Currently, to cope with this problem, we
> > split
> > > > the
> > > > >>>> > regions
> > > > >>>> > > > >> > finely.
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Thanks,
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Toshihiro Suzuki
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <
> > > > stack@duboce.net
> > > > >>>> >:
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <
> > > > >>>> > > brfrn169@gmail.com
> > > > >>>> > > > >
> > > > >>>> > > > >> > > wrote:
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > > Ted,
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > Thank you for your response.
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > I uploaded the complete stack trace to
> > > Gist.
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > I think that increment operation works
> as
> > > > >>>> follows:
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > 1. get row lock
> > > > >>>> > > > >> > > > > > > > > 2.
> > > mvcc.waitForPreviousTransactionsComplete()
> > > > >>>> //
> > > > >>>> > wait
> > > > >>>> > > > for
> > > > >>>> > > > >> all
> > > > >>>> > > > >> > > > prior
> > > > >>>> > > > >> > > > > > > MVCC
> > > > >>>> > > > >> > > > > > > > > transactions to finish
> > > > >>>> > > > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum()
> > //
> > > > >>>> start a
> > > > >>>> > > > >> > transaction
> > > > >>>> > > > >> > > > > > > > > 4. get previous values
> > > > >>>> > > > >> > > > > > > > > 5. create KVs
> > > > >>>> > > > >> > > > > > > > > 6. write to Memstore
> > > > >>>> > > > >> > > > > > > > > 7. write to WAL
> > > > >>>> > > > >> > > > > > > > > 8. release row lock
> > > > >>>> > > > >> > > > > > > > > 9.
> > mvcc.completeMemstoreInsertWithSeqNum()
> > > //
> > > > >>>> > complete
> > > > >>>> > > > the
> > > > >>>> > > > >> > > > > > transaction
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > A instance of
> > > MultiVersionConsistencyControl
> > > > >>>> has a
> > > > >>>> > > > pending
> > > > >>>> > > > >> > > queue
> > > > >>>> > > > >> > > > of
> > > > >>>> > > > >> > > > > > > > writes
> > > > >>>> > > > >> > > > > > > > > named writeQueue.
> > > > >>>> > > > >> > > > > > > > > Step 2 puts a WriteEntry w to
> writeQueue
> > > and
> > > > >>>> waits
> > > > >>>> > > until
> > > > >>>> > > > >> > > > writeQueue
> > > > >>>> > > > >> > > > > > is
> > > > >>>> > > > >> > > > > > > > > empty or writeQueue.getFirst() == w.
> > > > >>>> > > > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue
> > and
> > > > >>>> step 9
> > > > >>>> > > > removes
> > > > >>>> > > > >> the
> > > > >>>> > > > >> > > > > > > WriteEntry
> > > > >>>> > > > >> > > > > > > > > from writeQueue.
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > I think that when a handler thread is
> > > > >>>> processing
> > > > >>>> > > between
> > > > >>>> > > > >> > step 2
> > > > >>>> > > > >> > > > and
> > > > >>>> > > > >> > > > > > > step
> > > > >>>> > > > >> > > > > > > > 9,
> > > > >>>> > > > >> > > > > > > > > the other handler threads can wait
> until
> > > the
> > > > >>>> thread
> > > > >>>> > > > >> completes
> > > > >>>> > > > >> > > > step
> > > > >>>> > > > >> > > > > 9.
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > That is right. We need to read, after all
> > > > >>>> outstanding
> > > > >>>> > > > >> updates
> > > > >>>> > > > >> > are
> > > > >>>> > > > >> > > > > > done...
> > > > >>>> > > > >> > > > > > > > because we need to read the latest update
> > > > before
> > > > >>>> we go
> > > > >>>> > > to
> > > > >>>> > > > >> > > > > > > modify/increment
> > > > >>>> > > > >> > > > > > > > it.
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > How do you make out this?
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > "A region lock (not a row lock) seems to
> > > occur
> > > > in
> > > > >>>> > > > >> > > > > > > > waitForPreviousTransactionsComplete()."
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > In 0.98.x we did this:
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > ... and in 1.0 we do this:
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> mvcc.waitForPreviousTransactionsComplete()
> > > > which
> > > > >>>> is
> > > > >>>> > > > this....
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > +  public void
> > > > >>>> waitForPreviousTransactionsComplete() {
> > > > >>>> > > > >> > > > > > > > +    WriteEntry w =
> beginMemstoreInsert();
> > > > >>>> > > > >> > > > > > > > +
> > waitForPreviousTransactionsComplete(w);
> > > > >>>> > > > >> > > > > > > > +  }
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > The mvcc and region sequenceid were
> merged
> > in
> > > > >>>> 1.0 (
> > > > >>>> > > > >> > > > > > > >
> > > > https://issues.apache.org/jira/browse/HBASE-8763
> > > > >>>> ).
> > > > >>>> > > > Previous
> > > > >>>> > > > >> > mvcc
> > > > >>>> > > > >> > > > and
> > > > >>>> > > > >> > > > > > > > region
> > > > >>>> > > > >> > > > > > > > sequenceid would spin independent of each
> > > > other.
> > > > >>>> > Perhaps
> > > > >>>> > > > >> this
> > > > >>>> > > > >> > > > > > responsible
> > > > >>>> > > > >> > > > > > > > for some slow down.
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > That said, looking in your thread dump,
> we
> > > seem
> > > > >>>> to be
> > > > >>>> > > down
> > > > >>>> > > > >> in
> > > > >>>> > > > >> > the
> > > > >>>> > > > >> > > > > Get.
> > > > >>>> > > > >> > > > > > If
> > > > >>>> > > > >> > > > > > > > you do a bunch of thread dumps in a row,
> > > where
> > > > >>>> is the
> > > > >>>> > > > >> > > lock-holding
> > > > >>>> > > > >> > > > > > > thread?
> > > > >>>> > > > >> > > > > > > > In Get or writing Increment... or waiting
> > on
> > > > >>>> sequence
> > > > >>>> > > id?
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > Is it possible you are contending on a
> > > counter
> > > > >>>> > > > post-upgrade?
> > > > >>>> > > > >> > Is
> > > > >>>> > > > >> > > it
> > > > >>>> > > > >> > > > > > > > possible that all these threads are
> trying
> > to
> > > > >>>> get to
> > > > >>>> > the
> > > > >>>> > > > >> same
> > > > >>>> > > > >> > row
> > > > >>>> > > > >> > > > to
> > > > >>>> > > > >> > > > > > > update
> > > > >>>> > > > >> > > > > > > > it? Could the app behavior have changed?
> > Or
> > > > are
> > > > >>>> you
> > > > >>>> > > > >> thinking
> > > > >>>> > > > >> > > > > increment
> > > > >>>> > > > >> > > > > > > > itself has slowed significantly?
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > St.Ack
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > > Thanks,
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > Toshihiro Suzuki
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <
> > > > >>>> > yuzhihong@gmail.com
> > > > >>>> > > >:
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > In HRegion#increment(), we lock the
> row
> > > > (not
> > > > >>>> > > region):
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >     try {
> > > > >>>> > > > >> > > > > > > > > >       rowLock = getRowLock(row);
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > Can you pastebin the complete stack
> > > trace ?
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > Thanks
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕
> <
> > > > >>>> > > > >> brfrn169@gmail.com>
> > > > >>>> > > > >> > > > wrote:
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > Hi,
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > We upgraded our cluster from
> > > > >>>> > CDH5.3.1(HBase0.98.6)
> > > > >>>> > > > to
> > > > >>>> > > > >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > >>>> > > > >> > > > > > > > > > > and we experience slowdown in
> > increment
> > > > >>>> > operation.
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > Here's an extract from thread dump
> of
> > > the
> > > > >>>> > > > >> RegionServer of
> > > > >>>> > > > >> > > our
> > > > >>>> > > > >> > > > > > > > cluster:
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > Thread 68
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > > > >>>> > > > >> > > > > > > > > > >   State: BLOCKED
> > > > >>>> > > > >> > > > > > > > > > >   Blocked count: 21689888
> > > > >>>> > > > >> > > > > > > > > > >   Waited count: 39828360
> > > > >>>> > > > >> > > > > > > > > > >   Blocked on
> > > > java.util.LinkedList@3474e4b2
> > > > >>>> > > > >> > > > > > > > > > >   Blocked by 63
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > > > >>>> > > > >> > > > > > > > > > >   Stack:
> > > > >>>> > > > >> > > > > > > > > > >     java.lang.Object.wait(Native
> > > Method)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> >
> > > > >>>>
> > > >
> > org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > >
> > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > >
> > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > >
> > > > >>>>
> > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > > >>>> > > > >> > > > > > > > > > >
> > >  java.lang.Thread.run(Thread.java:745)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > There are many similar threads in
> the
> > > > >>>> thread
> > > > >>>> > dump.
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > I read the source code and I think
> > this
> > > > is
> > > > >>>> > caused
> > > > >>>> > > by
> > > > >>>> > > > >> > > changes
> > > > >>>> > > > >> > > > of
> > > > >>>> > > > >> > > > > > > > > > > MultiVersionConsistencyControl.
> > > > >>>> > > > >> > > > > > > > > > > A region lock (not a row lock)
> seems
> > to
> > > > >>>> occur in
> > > > >>>> > > > >> > > > > > > > > > >
> > waitForPreviousTransactionsComplete().
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > Also we wrote performance test code
> > for
> > > > >>>> > increment
> > > > >>>> > > > >> > operation
> > > > >>>> > > > >> > > > > that
> > > > >>>> > > > >> > > > > > > > > included
> > > > >>>> > > > >> > > > > > > > > > > 100 threads and ran it in local
> mode.
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > The result is shown below:
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
> > > > >>>> > > > >> > > > > > > > > > > Throughput(op/s): 12757,
> Latency(ms):
> > > > >>>> > > > >> 7.975072509210629
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > >>>> > > > >> > > > > > > > > > > Throughput(op/s): 2027,
> Latency(ms):
> > > > >>>> > > > 49.11840157868772
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > Thanks,
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > Toshihiro Suzuki
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > > >
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > > >>>
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message