hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5873) TimeOut Monitor thread should be started after atleast one region server registers.
Date Wed, 25 Apr 2012 18:54:18 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13261940#comment-13261940

Lars Hofhansl commented on HBASE-5873:

This change does violate encapsulation a bit.
I double checked where in the code we create instances of AssignmentManager. Besides the HMaster
it is only from tests (and they all pass it's good).

> TimeOut Monitor thread should be started after atleast one region server registers.
> -----------------------------------------------------------------------------------
>                 Key: HBASE-5873
>                 URL: https://issues.apache.org/jira/browse/HBASE-5873
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.6
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: rajeshbabu
>            Priority: Minor
>             Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0
>         Attachments: 5873-trunk.txt, HBASE-5873.patch
> Currently timeout monitor thread is started even before the region server has registered
with the master.
> In timeout monitor we depend on the region server to be online 
> {code}
> boolean allRSsOffline = this.serverManager.getOnlineServersList().
>         isEmpty();
> {code}
> Now when the master starts up it sees there are no online servers and hence sets 
> allRSsOffline to true.
> {code}
> setAllRegionServersOffline(allRSsOffline);
> {code}
> So this.allRegionServersOffline is also true.
> By this time an RS has come up,
> Now timeout comes up again (after 10secs) in the next cycle he sees allRSsOffline  as
> Hence 
> {code}
> else if (this.allRegionServersOffline && !allRSsOffline) {
>             // if some RSs just came back online, we can start the
>             // the assignment right away
>             actOnTimeOut(regionState);
> {code}
> This condition makes him to take action based on timeout.
> Because of this even if one Region assignment of ROOT is going on, this piece of code
triggers another assignment and thus we get RegionAlreadyinTransition Exception. Later we
need to wait for 30 mins for assigning ROOT itself.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message