Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hbase.apache.org
MIME-Version: 1.0
References: 
 <CANGerjyO+k+cPSKvOqxF7qVK9wZVSNm9JWDnd4q8D11y3mfROw@mail.gmail.com>
 <CALte62w++kBMNTZGL-8a0CG3gB2=d2gz6mghrkbOqdjOTcYNEg@mail.gmail.com>
 <CANGerjwgs8qrEoV=ySPNVybBk-4TzvpNKnX+9Yuiiids-Edxug@mail.gmail.com>
 <CADcMMgEbQQz3xgF=sxr-X5QBen3DkxyLuBsmc_ekhAn-XKEsuw@mail.gmail.com>
 <CANGerjwWBD80_8S3uBnb47PYrqXfTaU2e8Q1HY1dzjd9MLEcGg@mail.gmail.com>
 <CADcMMgEDKyxR15twf1Fm4vaF5iFg=NgNz9Q9XO_L_8g4EtkOOA@mail.gmail.com>
 <CANGerjy_dqPoavFbyNPb9c59gxjq__BOoSeYpOuev887d1o+sw@mail.gmail.com>
 <CADcMMgFQ7EoyLpx4vq7Ee=KzH2yQPYLUpeA80S933aiqs4gYqg@mail.gmail.com>
 <CANGerjw5ahhVfenQjx4DQQUnk1RsnYP-d7AeSn3Ze555B5whbA@mail.gmail.com>
 <CANZDn9u14BgO7QzB1rWCA2MHoXL2L_mY9EwNgX_QagX-hdsNBQ@mail.gmail.com>
 <CADcMMgFWDMzv1Q8ixMC0trFEmF=5Yr_QJD8tVPhPppwO0mBbnw@mail.gmail.com>
 <CANZDn9s4bHt-M2_iE4ap4eUzFsycvwnBf=VMyqWXFMzwUEhVfw@mail.gmail.com>
 <CANZDn9v_U_CjfGUNunOAmmup_V3oJNj-kkBDWMFMS2BGSqfZsA@mail.gmail.com>
 <CADcMMgHJSZ_QPFPdKKQzvnj6kkc7xZpiTUKtk9nu9LUxq5=5Tg@mail.gmail.com>
 <CANZDn9sY1y7ny1tqKe0on0t5rjhTJ+SJY=-BMcqjFaFgv82Wbw@mail.gmail.com>
 <CADcMMgGrkK+=q_Pi5VW9rDsUDmp5ZyxmjgDk9KY6UyfxO8eEMA@mail.gmail.com>
 <CANZDn9vLCftVSx0MhC1Bk4ZkS0QKggffH9SZkBK2mUeF7KrTaA@mail.gmail.com>
 <CANZDn9t1SCeWwS0cheGhrsMrJrLrpvMMxm0MQV5q3r=rKwuc9Q@mail.gmail.com>
In-Reply-To: 
 <CANZDn9t1SCeWwS0cheGhrsMrJrLrpvMMxm0MQV5q3r=rKwuc9Q@mail.gmail.com>
From: Bryan Beaudreault <bbeaudreault@hubspot.com>
Date: Tue, 01 Dec 2015 00:59:44 +0000
Message-ID: 
 <CANZDn9u=AHtNgjFv3R4Mjwp3nyjwwTGJuJG1L5makxfvngM+0w@mail.gmail.com>
Subject: Re: Performance degradation between CDH5.3.1(HBase0.98.6) and
 CDH5.4.5(HBase1.0.0)
To: HBase Dev List <dev@hbase.apache.org>
Cc: Hbase-User <user@hbase.apache.org>
Content-Type: multipart/alternative; boundary=001a113ce122f6424d0525cbaca7

--001a113ce122f6424d0525cbaca7
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

We've been doing more debugging of this and have set up the read vs write
handlers to try to at least segment this away so reads can work. We have
pretty beefy servers, and are running wiht the following settings:

hbase.regionserver.handler.count=3D1000
hbase.ipc.server.read.threadpool.size=3D50
hbase.ipc.server.callqueue.handler.factor=3D0.025
hbase.ipc.server.callqueue.read.ratio=3D0.6
hbase.ipc.server.callqueue.scan.ratio=3D0.5

We are seeing all 400 write handlers taken up by row locks for the most
part. The read handlers are mostly idle. We're thinking of changing the
ratio here, but are not sure it will help if they are all blocked on a row
lock.  We just enabled DEBUG logging on all our servers and notice the
following:

2015-12-01 00:56:09,015 DEBUG
org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected
by nonce: [-687451119961178644:7664336281906118656], [state 0, hasWait
false, activity 00:54:36.240]
2015-12-01 00:56:09,015 DEBUG
org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected
by nonce: [-687451119961178644:-7119840249342174227], [state 0, hasWait
false, activity 00:54:36.256]
2015-12-01 00:56:09,268 DEBUG
org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected
by nonce: [-5946137511131403479:2112661701888365489], [state 0, hasWait
false, activity 00:55:01.259]
2015-12-01 00:56:09,279 DEBUG
org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected
by nonce: [4165332617675853029:6256955295384472057], [state 0, hasWait
false, activity 00:53:58.151]
2015-12-01 00:56:09,279 DEBUG
org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected
by nonce: [4165332617675853029:4961178013070912522], [state 0, hasWait
false, activity 00:53:58.162]


On Mon, Nov 30, 2015 at 6:11 PM Bryan Beaudreault <bbeaudreault@hubspot.com=
>
wrote:

> Sorry the second link should be
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-=
txt-L579
>
> On Mon, Nov 30, 2015 at 6:10 PM Bryan Beaudreault <
> bbeaudreault@hubspot.com> wrote:
>
>> https://gist.github.com/bbeaudreault/2994a748da83d9f75085
>>
>> An active handler:
>> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1=
-txt-L286
>> One that is locked:
>> https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfil=
e1-txt-L579
>>
>> The difference between pre-rollback and post is that previously we were
>> seeing things blocked in mvcc.  Now we are seeing them blocked on the
>> upsert.
>>
>> It always follows the same pattern, of 1 active handler in the upsert an=
d
>> the rest blocked waiting for it.
>>
>> On Mon, Nov 30, 2015 at 6:05 PM Stack <stack@duboce.net> wrote:
>>
>>> On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault <
>>> bbeaudreault@hubspot.com
>>> > wrote:
>>>
>>> > The rollback seems to have mostly solved the issue for one of our
>>> clusters,
>>> > but another one is still seeing long increment times:
>>> >
>>> > "slowIncrementCount": 52080,
>>> > "Increment_num_ops": 325236,"Increment_min": 1,"Increment_max": 6162,=
"
>>> > Increment_mean": 465.68678129112396,"Increment_median": 216,"
>>> > Increment_75th_percentile": 450.25,"Increment_95th_percentile":
>>> > 1052.6499999999999,"Increment_99th_percentile": 1635.2399999999998
>>> >
>>> >
>>> > Any ideas if there are other changes that may be causing a performanc=
e
>>> > regression for increments between CDH4.7.1 and CDH5.3.8?
>>> >
>>> >
>>> >
>>> No.
>>>
>>> Post a thread dump Bryan and it might prompt something.
>>>
>>> St.Ack
>>>
>>>
>>>
>>>
>>> >
>>> > On Mon, Nov 30, 2015 at 4:13 PM Stack <stack@duboce.net> wrote:
>>> >
>>> > > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
>>> > > bbeaudreault@hubspot.com> wrote:
>>> > >
>>> > > > Should this be added as a known issue in the CDH or hbase
>>> > documentation?
>>> > > It
>>> > > > was a severe performance hit for us, all of our regionservers wer=
e
>>> > > sitting
>>> > > > at a few thousand queued requests.
>>> > > >
>>> > > >
>>> > > Let me take care of that.
>>> > > St.Ack
>>> > >
>>> > >
>>> > >
>>> > > > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
>>> > > > bbeaudreault@hubspot.com>
>>> > > > wrote:
>>> > > >
>>> > > > > Yea, they are all over the place and called from client and
>>> > coprocessor
>>> > > > > code. We ended up having no other option but to rollback, and
>>> aside
>>> > > from
>>> > > > a
>>> > > > > few NoSuchMethodErrors due to API changes (Put#add vs
>>> Put#addColumn),
>>> > > it
>>> > > > > seems to be working and fixing our problem.
>>> > > > >
>>> > > > > On Mon, Nov 30, 2015 at 3:47 PM Stack <stack@duboce.net> wrote:
>>> > > > >
>>> > > > >> Rollback is untested. No fix in 5.5. I was going to work on th=
is
>>> > now.
>>> > > > >> Where
>>> > > > >> are your counters Bryan? In their own column family or scatter=
ed
>>> > about
>>> > > > in
>>> > > > >> a
>>> > > > >> row with other Cell types?
>>> > > > >> St.Ack
>>> > > > >>
>>> > > > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
>>> > > > >> bbeaudreault@hubspot.com> wrote:
>>> > > > >>
>>> > > > >> > Is there any update to this? We just upgraded all of our
>>> > production
>>> > > > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA
>>> listed in
>>> > > the
>>> > > > >> > known issues, did not not about this.  Now we are seeing
>>> > perfomance
>>> > > > >> issues
>>> > > > >> > across all clusters, as we make heavy use of increments.
>>> > > > >> >
>>> > > > >> > Can we roll forward to CDH5.5 to fix? Or is our only hope to
>>> roll
>>> > > back
>>> > > > >> to
>>> > > > >> > CDH 5.3.1 (if that is possible)?
>>> > > > >> >
>>> > > > >> >
>>> > > > >> > On Thu, Sep 24, 2015 at 5:06 AM =E9=88=B4=E6=9C=A8=E4=BF=8A=
=E8=A3=95 <brfrn169@gmail.com>
>>> wrote:
>>> > > > >> >
>>> > > > >> > > Thank you St.Ack!
>>> > > > >> > >
>>> > > > >> > > I would like to follow the ticket.
>>> > > > >> > >
>>> > > > >> > > Toshihiro Suzuki
>>> > > > >> > >
>>> > > > >> > > 2015-09-22 14:14 GMT+09:00 Stack <stack@duboce.net>:
>>> > > > >> > >
>>> > > > >> > > > Back to this problem. Simple tests confirm that as is, t=
he
>>> > > > >> > > > single-queue-backed MVCC instance can slow Region ops if
>>> some
>>> > > > other
>>> > > > >> row
>>> > > > >> > > is
>>> > > > >> > > > slow to complete. In particular Increment, checkAndPut,
>>> and
>>> > > batch
>>> > > > >> > > mutations
>>> > > > >> > > > are effected. I opened HBASE-14460 to start in on a fix
>>> up.
>>> > Lets
>>> > > > >> see if
>>> > > > >> > > we
>>> > > > >> > > > can somehow scope mvcc to row or at least shard mvcc so
>>> not
>>> > all
>>> > > > >> Region
>>> > > > >> > > ops
>>> > > > >> > > > are paused.
>>> > > > >> > > >
>>> > > > >> > > > St.Ack
>>> > > > >> > > >
>>> > > > >> > > >
>>> > > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, =E9=88=B4=E6=9C=A8=E4=
=BF=8A=E8=A3=95 <brfrn169@gmail.com
>>> >
>>> > > wrote:
>>> > > > >> > > >
>>> > > > >> > > > > > Thank you for the below reasoning (with accompanying
>>> > helpful
>>> > > > >> > > diagram).
>>> > > > >> > > > > > Makes sense. Let me hack up a test case to help with
>>> the
>>> > > > >> > > illustration.
>>> > > > >> > > > It
>>> > > > >> > > > > > is as though the mvcc should be scoped to a row
>>> only...
>>> > > Writes
>>> > > > >> > > against
>>> > > > >> > > > > > other rows should not hold up my read of my row. Tag
>>> an
>>> > mvcc
>>> > > > >> with a
>>> > > > >> > > > 'row'
>>> > > > >> > > > > > scope so we can see which on-going writes pertain to
>>> > current
>>> > > > >> > > operation?
>>> > > > >> > > > > Thank you St.Ack! I think this approach would work.
>>> > > > >> > > > >
>>> > > > >> > > > > > You need to read back the increment and have it be
>>> > 'correct'
>>> > > > at
>>> > > > >> > > > increment
>>> > > > >> > > > > > time?
>>> > > > >> > > > > Yes, we need it.
>>> > > > >> > > > >
>>> > > > >> > > > > I would like to help if there is anything I can do.
>>> > > > >> > > > >
>>> > > > >> > > > > Thanks,
>>> > > > >> > > > > Toshihiro Suzuki
>>> > > > >> > > > >
>>> > > > >> > > > >
>>> > > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <stack@duboce.net>:
>>> > > > >> > > > >
>>> > > > >> > > > > > Thank you for the below reasoning (with accompanying
>>> > helpful
>>> > > > >> > > diagram).
>>> > > > >> > > > > > Makes sense. Let me hack up a test case to help with
>>> the
>>> > > > >> > > illustration.
>>> > > > >> > > > It
>>> > > > >> > > > > > is as though the mvcc should be scoped to a row
>>> only...
>>> > > Writes
>>> > > > >> > > against
>>> > > > >> > > > > > other rows should not hold up my read of my row. Tag
>>> an
>>> > mvcc
>>> > > > >> with a
>>> > > > >> > > > 'row'
>>> > > > >> > > > > > scope so we can see which on-going writes pertain to
>>> > current
>>> > > > >> > > operation?
>>> > > > >> > > > > >
>>> > > > >> > > > > > You need to read back the increment and have it be
>>> > 'correct'
>>> > > > at
>>> > > > >> > > > increment
>>> > > > >> > > > > > time?
>>> > > > >> > > > > >
>>> > > > >> > > > > > (This is a good one)
>>> > > > >> > > > > >
>>> > > > >> > > > > > Thank you Toshihiro Suzuki
>>> > > > >> > > > > > St.Ack
>>> > > > >> > > > > >
>>> > > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, =E9=88=B4=E6=9C=A8=
=E4=BF=8A=E8=A3=95 <
>>> brfrn169@gmail.com
>>> > >
>>> > > > >> wrote:
>>> > > > >> > > > > >
>>> > > > >> > > > > > > St.Ack,
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Thank you for your response.
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Why I make out that "A region lock (not a row lock=
)
>>> > seems
>>> > > to
>>> > > > >> > occur
>>> > > > >> > > in
>>> > > > >> > > > > > > waitForPreviousTransactionsComplete()" is as
>>> follows:
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > A increment operation has 3 procedures for MVCC.
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete();
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/sr=
c/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > 2. w =3D mvcc.beginMemstoreInsertWithSeqNum(mvccNu=
m);
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/sr=
c/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey=
);
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/sr=
c/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > I think that MultiVersionConsistencyControl's
>>> writeQueue
>>> > > can
>>> > > > >> > cause
>>> > > > >> > > a
>>> > > > >> > > > > > region
>>> > > > >> > > > > > > lock.
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/sr=
c/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyCon=
trol.java#L42-L43
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/sr=
c/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyCon=
trol.java#L102-L108
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Step 3 removes the WriteEntry from writeQueue.
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) ->
>>> > > > >> > > > > > > waitForPreviousTransactionsComplete(e) ->
>>> > > advanceMemstore(w)
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/sr=
c/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyCon=
trol.java#L127
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/sr=
c/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyCon=
trol.java#L235
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/sr=
c/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyCon=
trol.java#L160
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Step 1 adds a WriteEntry w in beginMemstoreInsert(=
)
>>> to
>>> > > > >> writeQueue
>>> > > > >> > > and
>>> > > > >> > > > > > waits
>>> > > > >> > > > > > > until writeQueue is empty or writeQueue.getFirst()
>>> =3D=3D w.
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/sr=
c/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyCon=
trol.java#L201-L204
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/sr=
c/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyCon=
trol.java#L206-L241
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > I think when a handler thread is processing betwee=
n
>>> > step 2
>>> > > > and
>>> > > > >> > step
>>> > > > >> > > > 3,
>>> > > > >> > > > > > the
>>> > > > >> > > > > > > other handler threads can wait at step 1 until the
>>> > thread
>>> > > > >> > completes
>>> > > > >> > > > > step
>>> > > > >> > > > > > 3
>>> > > > >> > > > > > > This is depicted as follows:
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86=
d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Actually, in the thread dump of our region server,
>>> many
>>> > > > >> handler
>>> > > > >> > > > threads
>>> > > > >> > > > > > > (RW.default.writeRpcServer.handler) wait at Step 1
>>> > > > >> > > > > > > (waitForPreviousTransactionsComplete()).
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86=
d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Many handler threads wait at this:
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/sr=
c/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyCon=
trol.java#L224
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > > Is it possible you are contending on a counter
>>> > > > post-upgrade?
>>> > > > >> > Is
>>> > > > >> > > it
>>> > > > >> > > > > > > > possible that all these threads are trying to ge=
t
>>> to
>>> > the
>>> > > > >> same
>>> > > > >> > row
>>> > > > >> > > > to
>>> > > > >> > > > > > > update
>>> > > > >> > > > > > > > it? Could the app behavior have changed?  Or are
>>> you
>>> > > > >> thinking
>>> > > > >> > > > > increment
>>> > > > >> > > > > > > > itself has slowed significantly?
>>> > > > >> > > > > > > We have just upgraded HBase, not changed the app
>>> > behavior.
>>> > > > We
>>> > > > >> are
>>> > > > >> > > > > > thinking
>>> > > > >> > > > > > > increment itself has slowed significantly.
>>> > > > >> > > > > > > Before upgrading HBase, it was good throughput and
>>> > > latency.
>>> > > > >> > > > > > > Currently, to cope with this problem, we split the
>>> > regions
>>> > > > >> > finely.
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Thanks,
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > Toshihiro Suzuki
>>> > > > >> > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <stack@duboce.net
>>> >:
>>> > > > >> > > > > > >
>>> > > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, =E9=88=B4=E6=9C=
=A8=E4=BF=8A=E8=A3=95 <
>>> > > brfrn169@gmail.com
>>> > > > >
>>> > > > >> > > wrote:
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > > Ted,
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > > Thank you for your response.
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > > I uploaded the complete stack trace to Gist.
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > https://gist.github.com/brfrn169/cb4f2c157129330cd932
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > > I think that increment operation works as
>>> follows:
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > > 1. get row lock
>>> > > > >> > > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete() =
//
>>> > wait
>>> > > > for
>>> > > > >> all
>>> > > > >> > > > prior
>>> > > > >> > > > > > > MVCC
>>> > > > >> > > > > > > > > transactions to finish
>>> > > > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() //
>>> start a
>>> > > > >> > transaction
>>> > > > >> > > > > > > > > 4. get previous values
>>> > > > >> > > > > > > > > 5. create KVs
>>> > > > >> > > > > > > > > 6. write to Memstore
>>> > > > >> > > > > > > > > 7. write to WAL
>>> > > > >> > > > > > > > > 8. release row lock
>>> > > > >> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() //
>>> > complete
>>> > > > the
>>> > > > >> > > > > > transaction
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > > A instance of MultiVersionConsistencyControl
>>> has a
>>> > > > pending
>>> > > > >> > > queue
>>> > > > >> > > > of
>>> > > > >> > > > > > > > writes
>>> > > > >> > > > > > > > > named writeQueue.
>>> > > > >> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue and
>>> waits
>>> > > until
>>> > > > >> > > > writeQueue
>>> > > > >> > > > > > is
>>> > > > >> > > > > > > > > empty or writeQueue.getFirst() =3D=3D w.
>>> > > > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and ste=
p
>>> 9
>>> > > > removes
>>> > > > >> the
>>> > > > >> > > > > > > WriteEntry
>>> > > > >> > > > > > > > > from writeQueue.
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > > I think that when a handler thread is processi=
ng
>>> > > between
>>> > > > >> > step 2
>>> > > > >> > > > and
>>> > > > >> > > > > > > step
>>> > > > >> > > > > > > > 9,
>>> > > > >> > > > > > > > > the other handler threads can wait until the
>>> thread
>>> > > > >> completes
>>> > > > >> > > > step
>>> > > > >> > > > > 9.
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > That is right. We need to read, after all
>>> outstanding
>>> > > > >> updates
>>> > > > >> > are
>>> > > > >> > > > > > done...
>>> > > > >> > > > > > > > because we need to read the latest update before
>>> we go
>>> > > to
>>> > > > >> > > > > > > modify/increment
>>> > > > >> > > > > > > > it.
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > How do you make out this?
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > "A region lock (not a row lock) seems to occur i=
n
>>> > > > >> > > > > > > > waitForPreviousTransactionsComplete()."
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > In 0.98.x we did this:
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > >
>>> > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > ... and in 1.0 we do this:
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > mvcc.waitForPreviousTransactionsComplete() which
>>> is
>>> > > > this....
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > +  public void
>>> waitForPreviousTransactionsComplete() {
>>> > > > >> > > > > > > > +    WriteEntry w =3D beginMemstoreInsert();
>>> > > > >> > > > > > > > +    waitForPreviousTransactionsComplete(w);
>>> > > > >> > > > > > > > +  }
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > The mvcc and region sequenceid were merged in 1.=
0
>>> (
>>> > > > >> > > > > > > > https://issues.apache.org/jira/browse/HBASE-8763
>>> ).
>>> > > > Previous
>>> > > > >> > mvcc
>>> > > > >> > > > and
>>> > > > >> > > > > > > > region
>>> > > > >> > > > > > > > sequenceid would spin independent of each other.
>>> > Perhaps
>>> > > > >> this
>>> > > > >> > > > > > responsible
>>> > > > >> > > > > > > > for some slow down.
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > That said, looking in your thread dump, we seem
>>> to be
>>> > > down
>>> > > > >> in
>>> > > > >> > the
>>> > > > >> > > > > Get.
>>> > > > >> > > > > > If
>>> > > > >> > > > > > > > you do a bunch of thread dumps in a row, where i=
s
>>> the
>>> > > > >> > > lock-holding
>>> > > > >> > > > > > > thread?
>>> > > > >> > > > > > > > In Get or writing Increment... or waiting on
>>> sequence
>>> > > id?
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > Is it possible you are contending on a counter
>>> > > > post-upgrade?
>>> > > > >> > Is
>>> > > > >> > > it
>>> > > > >> > > > > > > > possible that all these threads are trying to ge=
t
>>> to
>>> > the
>>> > > > >> same
>>> > > > >> > row
>>> > > > >> > > > to
>>> > > > >> > > > > > > update
>>> > > > >> > > > > > > > it? Could the app behavior have changed?  Or are
>>> you
>>> > > > >> thinking
>>> > > > >> > > > > increment
>>> > > > >> > > > > > > > itself has slowed significantly?
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > St.Ack
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > > > > Thanks,
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > > Toshihiro Suzuki
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <
>>> > yuzhihong@gmail.com
>>> > > >:
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > > > > In HRegion#increment(), we lock the row (not
>>> > > region):
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > > >     try {
>>> > > > >> > > > > > > > > >       rowLock =3D getRowLock(row);
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > > > Can you pastebin the complete stack trace ?
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > > > Thanks
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, =E9=88=B4=E6=
=9C=A8=E4=BF=8A=E8=A3=95 <
>>> > > > >> brfrn169@gmail.com>
>>> > > > >> > > > wrote:
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > > > > Hi,
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > We upgraded our cluster from
>>> > CDH5.3.1(HBase0.98.6)
>>> > > > to
>>> > > > >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
>>> > > > >> > > > > > > > > > > and we experience slowdown in increment
>>> > operation.
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > Here's an extract from thread dump of the
>>> > > > >> RegionServer of
>>> > > > >> > > our
>>> > > > >> > > > > > > > cluster:
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > Thread 68
>>> > > > >> > > > > > >
>>> > (RW.default.writeRpcServer.handler=3D15,queue=3D5,port=3D60020):
>>> > > > >> > > > > > > > > > >   State: BLOCKED
>>> > > > >> > > > > > > > > > >   Blocked count: 21689888
>>> > > > >> > > > > > > > > > >   Waited count: 39828360
>>> > > > >> > > > > > > > > > >   Blocked on java.util.LinkedList@3474e4b2
>>> > > > >> > > > > > > > > > >   Blocked by 63
>>> > > > >> > > > > > > > >
>>> > > > (RW.default.writeRpcServer.handler=3D10,queue=3D0,port=3D60020)
>>> > > > >> > > > > > > > > > >   Stack:
>>> > > > >> > > > > > > > > > >     java.lang.Object.wait(Native Method)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.wai=
tForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.wai=
tForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > >
>>> > > > >> > >
>>> > > > >>
>>> > > >
>>> >
>>> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:671=
2)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServi=
ces.java:501)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMut=
ation(RSRpcServices.java:570)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.=
java:1901)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2=
.callBlockingMethod(ClientProtos.java:31451)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > >
>>> > > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > >
>>> > > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:1=
30)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > >
>>> > > > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:10=
7)
>>> > > > >> > > > > > > > > > >     java.lang.Thread.run(Thread.java:745)
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > There are many similar threads in the thre=
ad
>>> > dump.
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > I read the source code and I think this is
>>> > caused
>>> > > by
>>> > > > >> > > changes
>>> > > > >> > > > of
>>> > > > >> > > > > > > > > > > MultiVersionConsistencyControl.
>>> > > > >> > > > > > > > > > > A region lock (not a row lock) seems to
>>> occur in
>>> > > > >> > > > > > > > > > > waitForPreviousTransactionsComplete().
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > Also we wrote performance test code for
>>> > increment
>>> > > > >> > operation
>>> > > > >> > > > > that
>>> > > > >> > > > > > > > > included
>>> > > > >> > > > > > > > > > > 100 threads and ran it in local mode.
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > The result is shown below:
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
>>> > > > >> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms):
>>> > > > >> 7.975072509210629
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
>>> > > > >> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms):
>>> > > > 49.11840157868772
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > Thanks,
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > > > Toshihiro Suzuki
>>> > > > >> > > > > > > > > > >
>>> > > > >> > > > > > > > > >
>>> > > > >> > > > > > > > >
>>> > > > >> > > > > > > >
>>> > > > >> > > > > > >
>>> > > > >> > > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>

--001a113ce122f6424d0525cbaca7--