hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
Date Fri, 13 Jul 2012 04:17:35 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413482#comment-13413482
] 

Hadoop QA commented on HBASE-6389:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12536324/HBASE-6389_trunk.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The applied patch generated 5 javac compiler warnings (more than the trunk's
current 4 warnings).

    -1 findbugs.  The patch appears to introduce 8 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit
warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.master.TestMasterFailover

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2379//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2379//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2379//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2379//console

This message is automatically generated.
                
> Modify the conditions to ensure that Master waits for sufficient number of Region Servers
before starting region assignments
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-6389
>                 URL: https://issues.apache.org/jira/browse/HBASE-6389
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.94.0, 0.96.0
>            Reporter: Aditya Kishore
>            Assignee: Aditya Kishore
>            Priority: Critical
>             Fix For: 0.96.0, 0.94.1
>
>         Attachments: HBASE-6389_trunk.patch
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of "hbase.master.wait.on.regionservers.mintostart"
to a sufficient number (from default of 1) can help prevent assignment of all regions to one
(or a small number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 0.94.0 onwards
to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has lapsed, even
if "hbase.master.wait.on.regionservers.mintostart" has not reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> ....
> 581	  /**
> 582	   * Wait for the region servers to report in.
> 583	   * We will wait until one of this condition is met:
> 584	   *  - the master is stopped
> 585	   *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586	   *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587	   *    region servers is reached
> 588	   *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
> 589	   *   there have been no new region server in for
> 590	   *      'hbase.master.wait.on.regionservers.interval' time
> 591	   *
> 592	   * @throws InterruptedException
> 593	   */
> 594	  public void waitForRegionServers(MonitoredTask status)
> 595	  throws InterruptedException {
> ....
> ....
> 612	    while (
> 613	      !this.master.isStopped() &&
> 614	        slept < timeout &&
> 615	        count < maxToStart &&
> 616	        (lastCountChange+interval > now || count < minToStart)
> 617	      ){
> ....
> {code}
> So with the current conditions, the wait will end as soon as timeout is reached even
lesser number of RS have checked-in with the Master and the master will proceed with the region
assignment among these RSes alone.
> As mentioned in -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
and I concur, this could have disastrous effect in large cluster especially now that MSLAB
is turned on.
> To enforce the required quorum as specified by "hbase.master.wait.on.regionservers.mintostart"
irrespective of timeout, these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>    * Wait for the region servers to report in.
>    * We will wait until one of this condition is met:
>    *  - the master is stopped
>    *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>    *    region servers is reached
>    *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>    *   there have been no new region server in for
>    *      'hbase.master.wait.on.regionservers.interval' time AND
>    *   the 'hbase.master.wait.on.regionservers.timeout' is reached
>    *
>    * @throws InterruptedException
>    */
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
>     int minToStart = this.master.getConfiguration().
>     getInt("hbase.master.wait.on.regionservers.mintostart", 1);
>     int maxToStart = this.master.getConfiguration().
>     getInt("hbase.master.wait.on.regionservers.maxtostart", Integer.MAX_VALUE);
>     if (maxToStart < minToStart) {
>       maxToStart = minToStart;
>     }
> ..
> ..
>     while (
>       !this.master.isStopped() &&
>         count < maxToStart &&
>         (lastCountChange+interval > now || timeout > slept || count < minToStart)
>       ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message