hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-3789) Cleanup the locking contention in the master
Date Wed, 25 May 2011 22:05:49 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039368#comment-13039368
] 

Jean-Daniel Cryans commented on HBASE-3789:
-------------------------------------------

There's one major issue with my current patch and it's that there's a race between the master's
OpenedRegionHandler and the events thread. It goes like this:

 - RS transitions a region to OPENING
 - RS transitions again to OPENING
 - Master receives the first event, reads ZK and sees OPENING
 - RS transitions to OPENED
 - Master receives the second event, reads ZK and sees OPENED instead of OPENING, kicks of
the OpenedRegionHandler
 - The handler will at some point delete the znode in the ZKW.getNodes structure (such a bad
method name) before deleting the actual znode
 - Master receives the third event, reads ZK, sees OPENED but finds that getNodes doesn't
contain the znode and considers this as a new region in transition so it adds it back in getNodes()
 - The handler deletes the znode
 - The Master does a no-op when it figures it cannot transition from OPEN to OPENED

At this point the region is assigned and everything is "fine"... until the master decides
for any reason to unassign the region. It sends the unassignment, receives an event but doesn't
process it in nodeChildrenChanged because ZKW.getNodes() already has it. From the point the
master will spin in "Region has been PENDING_CLOSE for too long" until it's put out of its
misery.

The issue here is that the region server is creating the unassigned znode by itself, unlike
an assignment where it's the master that does it. Doing that in the master won't fully solve
the issue tho because in 0.92 the RS still create znodes for splits and there's no way that
could be done by the master is it would be basically like returning back to how it used to
work.

So this is what Stack and I thought about:

 - The master needs to create the unassigned znode before telling a RS to close a region,
the RS will now just update it
 - ZKW needs to stop keeping track of the znodes, getting into a situation where we have a
mismatch is too easy
 - The SplitTransaction will still create the znode, but it will then wait at the very end
until it gets deleted by the master. To make sure the master sees the change, it will tickle
the znode like we do for OPENING so that the master doesn't miss it
 - The method AssignmentManager.nodeChildrenChanged will only put watchers on znodes and won't
keep track of anything

> Cleanup the locking contention in the master
> --------------------------------------------
>
>                 Key: HBASE-3789
>                 URL: https://issues.apache.org/jira/browse/HBASE-3789
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.2
>            Reporter: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3789.patch
>
>
> The new master uses a lot of synchronized blocks to be safe, but it only takes a few
jstacks to see that there's multiple layers of lock contention when a bunch of regions are
moving (like when the balancer runs). The main culprits are regionInTransition in AssignmentManager,
ZKAssign that uses ZKW.getZNnodes (basically another set of region in transitions), and locking
at the RegionState level. 
> My understanding is that even tho we have multiple threads to handle regions in transition,
everything is actually serialized. Most of the time, lock holders are talking to ZK or a region
server, which can take a few milliseconds.
> A simple example is when AssignmentManager wants to update the timers for all the regions
on a RS, it will usually be waiting on another thread that's holding the lock while talking
to ZK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message