hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-3789) Cleanup the locking contention in the master
Date Fri, 27 May 2011 22:20:47 GMT

     [ https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jean-Daniel Cryans updated HBASE-3789:

    Attachment: HBASE-3789-v3-0.90.patch

With the previous patch all the tests passed except for hbck. Looking deeper, I see hbck creates
it's own znodes so now the master doesn't see that. It's not clear to my why it's not using
HBA.assign instead of the trickery with the HBCK_CODE_NAME.

This patch modifies hbck so that it uses "normal" tools provided by the master instead of
bypassing it.

I'm also working on porting that to trunk. I got the previous patch I posted working but didn't
do the hbck stuff yet because it's different.

Also I still didn't touch the splitting code in trunk.

> Cleanup the locking contention in the master
> --------------------------------------------
>                 Key: HBASE-3789
>                 URL: https://issues.apache.org/jira/browse/HBASE-3789
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.2
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.92.0
>         Attachments: HBASE-3789-v2-0.90.patch, HBASE-3789-v3-0.90.patch, HBASE-3789.patch
> The new master uses a lot of synchronized blocks to be safe, but it only takes a few
jstacks to see that there's multiple layers of lock contention when a bunch of regions are
moving (like when the balancer runs). The main culprits are regionInTransition in AssignmentManager,
ZKAssign that uses ZKW.getZNnodes (basically another set of region in transitions), and locking
at the RegionState level. 
> My understanding is that even tho we have multiple threads to handle regions in transition,
everything is actually serialized. Most of the time, lock holders are talking to ZK or a region
server, which can take a few milliseconds.
> A simple example is when AssignmentManager wants to update the timers for all the regions
on a RS, it will usually be waiting on another thread that's holding the lock while talking
to ZK.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message