hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-18144) Forward-port the old exclusive row lock; there are scenarios where it performs better
Date Thu, 01 Jun 2017 06:11:04 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032513#comment-16032513
] 

stack commented on HBASE-18144:
-------------------------------

h1. Setup

In hbase-1.1.x and earlier, hbase took an exclusive lock on the WHOLE row whether a read,
write, or increment (read-modify-write). Absolute single-threaded access crassly ensured we
returned a consistent view. The hbase-1.1.x system was also indiscriminate about which queued
handler next got access; i.e it was not ‘fair’.

In hbase-1.2.x, the hbase row lock was moved to use a reentrant read/write native java lock
as its core (HBASE-12751); we took the read lock whenever we read or mutated the row – MVCC
ensured consistent view while the row was under modification – and the write lock was taken
only for the usually ‘rare’ checkAndPut type operation; i.e. checkAndPut, checkAndDelete,
Increment, and Append. The non-exclusive access to the Row upped throughput (in part because
we were also able to simplify our read/write path using the new approach). The fairness helped
latencies. hbase-1.2.x has been out for a good while now. No one up to this to my knowledge
has complained about they new-style-locking.

What happens though when you have a loading that is mostly increments all on a single row;
i.e. lots of write exclusive locks concurrent with lots of read locks?

We found that the old exclusive lock performs way better than hbase-1.2.x read/write locking
at least in one extreme, so much so, we need to bring it back as a option if only so we can
carry such 'extreme' increment-type workloads on  newer hbases (we should work on getting
the write-only-increment working as first class in hbase, but that is for another issue).
This issue is about adding back exclusive lock as a table option.

h1. Problem

In essence, at a users deploy, all threads in variations of the below:
{code}
"RpcServer.FifoWFPBQ.default.handler=190,queue=10,port=60020" #243 daemon prio=5 os_prio=0
tid=0x00007fbb58691800 nid=0x2d2527 waiting on condition [0x00007fbb4ca49000]
   java.lang.Thread.State: TIMED_WAITING (parking)
  at sun.misc.Unsafe.park(Native Method)
  - parking to wait for  <0x00000007c6001b38> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247)
  at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1115)
  at org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5171)
  at org.apache.hadoop.hbase.regionserver.HRegion.doIncrement(HRegion.java:7453)
{code}
Or this:
{code}
"RpcServer.FifoWFPBQ.default.handler=180,queue=0,port=60020" #233 daemon prio=5 os_prio=0
tid=0x00007fbb586ed800 nid=0x2d251d waiting on condition [0x00007fbb4d453000]
   java.lang.Thread.State: TIMED_WAITING (parking)
  at sun.misc.Unsafe.park(Native Method)
  - parking to wait for  <0x0000000354976c00> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
  at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:871)
  at org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5171)
  at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3017)
…
{code}
All threads are stuck trying to get the row lock.

Various diggings had it that all 200 handlers (200 in this case) were NOT trying to get to
the same row but that there were about 16 rows being addressed with varying numbers of handlers
backed up (from 50 handlers in one case down to 10 or so at the other extreme).

It looks like a deadlock. You’d have expected someone of the handlers was inside doing the
lock but in this thead dump, there were none.

Thread dumps would stay in this stasis for minutes. Progress seemed to be going on because
the players changed across thread dumps (we figured the thread dump at ‘safe point’ was
not giving us a true picture of the actual state of affairs). Eventually we got over the hump
and the RegionServer came alive again.

Looking at the same load up on an hbase-1.0.x, we’d see handlers stuck trying to get locks
but only about 1/5th of all handlers would be stuck on trying to get locks at peak.

h1. Investigation

It took a while to arrive back at locking as the culprit. The tea leaves as noted above pointed
at locking but we wouldn’t buy it: “hbase-1.2.x has been shipping for ages…”, “Measurement
shows the new locking taking 10-20% longer but lock-acquisition-time is so short, it probably
immeasurable”, etc.
A big problem was our inability at being able to faithfully reproduce the loading locally
([~esteban] and I made distinct attempts in two different environments). While eventually
in simulations we could reproduce the ‘weird’ thread dump the user was seeing,1.2.x always
had overall better throughput than 1.0.x in our simulations (lock-acquisition-time was indeed
longer in 1,2,x... by about 20%).

Here are some of the paths we took following the wild goose (we spent a good while on this
issue – here is a sketch of what we were at):

h2. Row Key Hash

We spent time on the key used to finger locks. It was made of a hash of the row key bytes.
The hashing algorithm is poor. We speculated clashes were broadening the scope of the lock
or there was a scenario where we obtained the lock with one hash but a hash that clashed somehow
unlocked another leaving locks in place.

Interesting findings included the fact that hashing took longer the longer the row key was
(User keys wer of 150-200 characters).

When we found that row key hash in 1.1.x was same as in 1.2.x, this put an end to this particular
pursuit.

h2. Compare

We expended a bunch of effort comparing locking in 1.0.x and 1.2.x. We measured how long lock
acquisition was taking (fair vs unfair, exclusive vs read/write) and how many locks were being
taken. It looked like 1.2.x was taking many more than 1.0.x. Comparing code bases, it looked
liked we’d dropped an optimization going to 1.2 here:

{code}
commit b15a05d7b611726ff9b2dd1a3c5c9448b41e1474
Author: Enis Soztutar <enis@apache.org>
Date:   Wed Oct 28 18:16:32 2015 -0700

	HBASE-14689 Addendum and unit test for HBASE-13471
{code}
… a patch that was subsequently messily reverted then backported here:
{code}
commit bae2a970817d3693946c5b1d4e2382e56b1e94b8
Author: stack <stack@apache.org>
Date:   Wed Sep 30 11:48:46 2015 -0700

	HBASE-14465 Backport 'Allow rowlock to be reader/write' to branch-1

     	// If we haven't got any rows in our batch, we should block to
     	// get the next one.
-    	boolean shouldBlock = numReadyToWrite == 0;
     	RowLock rowLock = null;
     	try {
-      	rowLock = getRowLockInternal(mutation.getRow(), shouldBlock);
+      	rowLock = getRowLock(mutation.getRow(), true);
     	} catch (IOException ioe) {
       	LOG.warn("Failed getting lock in batch put, row="
         	+ Bytes.toStringBinary(mutation.getRow()), ioe);
     	}
     	if (rowLock == null) {
       	// We failed to grab another lock
-      	assert !shouldBlock : "Should never fail to get lock when blocking";
+      	assert false: "Should never fail to get lock when blocking";

{code}
Turned out we were counting locking incorrectly and there was a misunderstanding around lock
aggregation (HBASE-18074 ).

We even compared running on different JVM versions.

h2. Improving Lock-access time

Idea was to improve 1.2 locking regime. Did profiling and then experiments, refactors of lock
acquisition making do w/ less Atomics and volatiles, and investigation of row sorting so can
take lock once for a group of edits…. Gave up on this route after seeing how little improvement
we could make and after figuring that this code that originally came from FB had had a lot
of thought applied already so pickings would be slim.

h1. Culprit: ReentrantReadWriteLock performance goes south if writes
RRWL is from circa 2005 (JDK5?). No good looking in the bug database for issues in 2017. But
from the javadoc:
"ReentrantReadWriteLocks can be used to improve concurrency in some uses of some kinds of
Collections. This is typically worthwhile only when the collections are expected to be large,
and entail operations with overhead that outweighs synchronization overhead."

And “...if updates become frequent then the data spends most of its time being exclusively
locked and there is little, if any increase in concurrency.”

Or Ben Manes in https://groups.google.com/forum/#!topic/guava-discuss/Vqu34OODWdw "Read/Write
(shared) locks are expensive to maintain, much more so than exclusive locks. If the lock hold
time is small then most often a regular lock (synchronized or ReentrantLock) is faster. It
is only when the lock hold time is large does the benefit of a shared lock become apparent."

This article on RWLock being AWFUL -- https://www.javacodegeeks.com/2014/06/java-8-stampedlocks-vs-readwritelocks-and-synchronized.html
-- gets dinged in the comments section for its testing technique only the setup seemed to
resemble the user-under-scrutiny loading (hundreds of concurrent threads w/ a good portion
taking write locks); i.e. a read from hbase can turn into 1k Puts followed by 1k Increments
(IIUC) all on the one row.

So, decided to try forward porting the old mechanism, an exclusive (reentrant) lock rather
than a RRWL as our locking basis.

Testing though in my setup, the exclusive lock seemed to perform worse but deploying the patch
at user production, it does much better than the 1.2.x default locking regime.

Let me forward port the old locking mechanism in case anyone else has a workload like the
above users.

> Forward-port the old exclusive row lock; there are scenarios where it performs better
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-18144
>                 URL: https://issues.apache.org/jira/browse/HBASE-18144
>             Project: HBase
>          Issue Type: Bug
>          Components: Increment
>    Affects Versions: 1.2.5
>            Reporter: stack
>            Assignee: stack
>             Fix For: 2.0.0, 1.3.2, 1.2.7
>
>
> Description to follow.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message