hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ramkrishna.s.vasudevan (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4479) TestMasterFailover failure in Hbase-0.92#17
Date Mon, 03 Oct 2011 12:10:35 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119258#comment-13119258
] 

ramkrishna.s.vasudevan commented on HBASE-4479:
-----------------------------------------------

As per my analysis
When a master tries to come up and due to ZK exception if fails and we call abort where we
try once again if things get back to normal.
{code}
 if (t != null && t instanceof KeeperException.SessionExpiredException) {
      try {
        LOG.info("Primary Master trying to recover from ZooKeeper session " +
            "expiry.");
        return !tryRecoveringExpiredZKSession();
      } catch (Throwable newT) {
        LOG.error("Primary master encountered unexpected exception while " +
            "trying to recover from ZooKeeper session" +
            " expiry. Proceeding with server abort.", newT);
      }
    }
{code}
Here we try to assign the ROOT and META and the RS hosting it may not be online that time.
So carry on with processRIT and this being a clean cluster startup we try to assignAllUserRegions().
As part of which we try to retainAssignment().
Here the no of servers itself is 0 .
{code}
for (ServerName server : servers) {
      assignments.put(server, new ArrayList<HRegionInfo>());
    }
{code}
assignments.size() = 0.
{code}
else {
        int size = assignments.size();
        assignments.get(servers.get(RANDOM.nextInt(size))).add(region.getKey());
      }
{code}
This throws illegalArgumentException which makes the master to abort.
Though this may be a testcase failure there is a rare chance that this can also happen in
real time and the attempt made to bring the master alive due to ZK exception may not work
because of this.

Pls correct me if am wrong.





                
> TestMasterFailover failure in Hbase-0.92#17
> -------------------------------------------
>
>                 Key: HBASE-4479
>                 URL: https://issues.apache.org/jira/browse/HBASE-4479
>             Project: HBase
>          Issue Type: Bug
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Minor
>
> When the master restarted it was not able to get any servers online and the restart was
a clean restart.
> Hence there were no regions to assign.
> Hence the retainAssignment tries to get one of the regions and uses RANDOM.getInt(size).
 Here size is 0.
> So ideally 0 is not accepted here.  Hence we have got an exception making the master
not to come up and the test case timeout.
> Though we need to see if really no regions was expected when the master came up, but
this JIRA's intent is to deal such scenario where the size can be 0.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message