hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17653) HBASE-17624 rsgroup synchronizations will (distributed) deadlock
Date Sat, 18 Feb 2017 03:56:44 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872975#comment-15872975
] 

Hudson commented on HBASE-17653:
--------------------------------

SUCCESS: Integrated in Jenkins build HBase-Trunk_matrix #2524 (See [https://builds.apache.org/job/HBase-Trunk_matrix/2524/])
HBASE-17653 HBASE-17624 rsgroup synchronizations will (distributed) (stack: rev b392de3e315aa260e2825484e418701919eb7622)
* (edit) hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupAdminServer.java
* (edit) hbase-rsgroup/src/test/java/org/apache/hadoop/hbase/rsgroup/TestRSGroupsOfflineMode.java
* (edit) hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupBasedLoadBalancer.java
* (edit) hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupInfoManagerImpl.java
* (edit) hbase-rsgroup/src/test/java/org/apache/hadoop/hbase/rsgroup/TestRSGroups.java


> HBASE-17624 rsgroup synchronizations will (distributed) deadlock
> ----------------------------------------------------------------
>
>                 Key: HBASE-17653
>                 URL: https://issues.apache.org/jira/browse/HBASE-17653
>             Project: HBase
>          Issue Type: Bug
>          Components: rsgroup
>            Reporter: stack
>            Assignee: stack
>             Fix For: 2.0.0
>
>         Attachments: HBASE-17653.master.001.patch, HBASE-17653.master.002.patch, HBASE-17653.master.003.patch
>
>
> Follow-on from HBASE-17624. HBASE-17624 made it so one thread only has access to the
rsgroup administrator. In tail of HBASE-17624 [~toffer] describes scenario under which we
 may end up in a deadlock (distributed). Let me repeat [~toffer] comment...
> {code}
> Both read/write access can't be single threaded. Consider the situation:
> 1. move_rsgroup_servers is called
> 2. while #1 is happening rsgroup region is in transition (rpc thread in #1 holds monitor
lock)
> 3. while #2 is happening meta is in transition.
> Balancer tries to figure out plan for meta region tries to get monitor lock but can't.
rpc thread task won't release monitor lock since rsgroup region never gets assigned. rsgroup
region never gets assigned because it can't update meta with new state.
> There's a good chance this can be reproduce just by moving both rsgroup and meta region
onto the same RS and call move_rsgoup_servers on the same RS.
> A bunch different actors will query from group affiliation so we can't have writes block
reads.
> ....
> In the code prior to this patch the getter methods that retrieve group information (getRSGroup,
ofTable, OfServer, etc) don't require the monitor lock so the deadlock cycle is broken.
> ....
> The methods that does mutations and updates to zk and hbase:rsgroup are synchronized
appropriately. Point me to where the incoherence is?
> {code}
> This issue is about testing/fixing/restoring rsgroup access. Will be back.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message