hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ramkrishna.s.vasudevan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT
Date Wed, 23 May 2012 15:42:41 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281678#comment-13281678
] 

ramkrishna.s.vasudevan commented on HBASE-6070:
-----------------------------------------------

I plan to make the following change in AM.nodeDeleted.  Currently as SSH is trying to handle
the RIT in splitting state doing the same in AM.nodeDeleted leads to race.  
{code}
-        if (rs.isSplitting() || rs.isSplit()) {
+        if (rs.isSplit()) {
           LOG.debug("Ephemeral node deleted, regionserver crashed?, " +
             "clearing from RIT; rs=" + rs);
           regionOffline(rs.getRegion());
{code}
Pls provide your suggestions.
                
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
>                 Key: HBASE-6070
>                 URL: https://issues.apache.org/jira/browse/HBASE-6070
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1, 0.94.0
>            Reporter: ramkrishna.s.vasudevan
>             Fix For: 0.92.2, 0.96.0, 0.94.1
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region
is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the
RIT.
> -> After this we try to see in the SSH whether any node is in RIT.  As we dont find
the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened.  So we missed
it.  Now we found that. Will come up with a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message