hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ramkrishna.s.vasudevan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server
Date Thu, 21 Jun 2012 15:18:42 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398481#comment-13398481
] 

ramkrishna.s.vasudevan commented on HBASE-5875:
-----------------------------------------------

@Devs
{code}
+      // Make sure a -ROOT- location is set.
+      if (!isRootLocation()) return false;
+      // This guarantees that the transition assigning -ROOT- has completed
+      this.assignmentManager.waitForAssignment(HRegionInfo.ROOT_REGIONINFO);
+      assigned++;
{code}
and 
{code}
+      // Wait until META region added to region server onlineRegions. See HBASE-5875.
+      enableSSHandWaitForMeta();
+      assigned++;
{code}
This will ensure that we wait for ROOT and META.  Now as HBASE-5918 has gone in, if any RS
goes down inbetween root and META assignment SSH will also be triggered.  
The main intention in this patch is to avoid 
{code}
splitLogAndExpireIfOnline(currentRootServer);
....
splitLogAndExpireIfOnline(currentMetaServer);
{code}
because the above code in case of ROOT and META in rit was removing the current active server
thinking it as dead in case the ROOT or META is not yet online on RS.
                
> Process RIT and Master restart may remove an online server considering it as a dead server
> ------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5875
>                 URL: https://issues.apache.org/jira/browse/HBASE-5875
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 0.94.1
>
>         Attachments: HBASE-5875.patch, HBASE-5875_0.94.patch, HBASE-5875_0.94_1.patch,
HBASE-5875_trunk.patch, HBASE-5875v2.patch
>
>
> If on master restart it finds the ROOT/META to be in RIT state, master tries to assign
the ROOT region through ProcessRIT.
> Master will trigger the assignment and next will try to verify the Root Region Location.
> Root region location verification is done seeing if the RS has the region in its online
list.
> If the master triggered assignment has not yet been completed in RS then the verify root
region location will fail.
> Because it failed 
> {code}
> splitLogAndExpireIfOnline(currentRootServer);
> {code}
> we do split log and also remove the server from online server list. Ideally here there
is nothing to do in splitlog as no region server was restarted.
> So master, though the server is online, master just invalidates the region server.
> In a special case, if i have only one RS then my cluster will become non operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message