hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (Assigned) (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (HBASE-5639) The logic used in waiting for region servers during startup is broken
Date Tue, 27 Mar 2012 18:57:31 GMT

     [ https://issues.apache.org/jira/browse/HBASE-5639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jean-Daniel Cryans reassigned HBASE-5639:
-----------------------------------------

    Assignee: Jean-Daniel Cryans  (was: nkeywal)

Here's what I see now with the patch:

{noformat}
2012-03-27 18:53:07,644 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
servers count to settle; currently checked in 0, slept for 0 ms, expecting minimum of 1, maximum
of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2012-03-27 18:53:08,638 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=sv4r29s44,62023,1332874388301
2012-03-27 18:53:08,638 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=sv4r27s44,62023,1332874388324
2012-03-27 18:53:08,649 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
servers count to settle; currently checked in 2, slept for 1005 ms, expecting minimum of 1,
maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2012-03-27 18:53:08,656 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=sv4r5s38,62023,1332874388319
2012-03-27 18:53:08,657 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=sv4r6s38,62023,1332874388364
2012-03-27 18:53:08,662 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=sv4r8s38,62023,1332874388371
2012-03-27 18:53:08,699 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
servers count to settle; currently checked in 5, slept for 1055 ms, expecting minimum of 1,
maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2012-03-27 18:53:08,897 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=sv4r31s44,62023,1332874388453
2012-03-27 18:53:08,900 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
servers count to settle; currently checked in 6, slept for 1256 ms, expecting minimum of 1,
maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2012-03-27 18:53:09,602 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=sv4r30s44,62023,1332874388969
2012-03-27 18:53:09,603 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
servers count to settle; currently checked in 7, slept for 1959 ms, expecting minimum of 1,
maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2012-03-27 18:53:11,110 INFO org.apache.hadoop.hbase.master.ServerManager: Finished waiting
for region servers count to settle; checked in 7, slept for 3466 ms, expecting minimum of
1, maximum of 2147483647, master is running.
{noformat}

It confirms it did the right thing, go wild Lars :)
                
> The logic used in waiting for region servers during startup is broken
> ---------------------------------------------------------------------
>
>                 Key: HBASE-5639
>                 URL: https://issues.apache.org/jira/browse/HBASE-5639
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.94.0
>
>         Attachments: HBASE-5639.patch
>
>
> See the tail of HBASE-4993, which I'll report here:
> Me:
> {quote}
> I think a bug was introduced here. Here's the new waiting logic in waitForRegionServers:
> the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>    there have been no new region server in for
>       'hbase.master.wait.on.regionservers.interval' time
> And the code that verifies that:
> !(lastCountChange+interval > now && count >= minToStart)
> {quote}
> Nic:
> {quote}
> It seems that changing the code to
> (count < minToStart ||
> lastCountChange+interval > now)
> would make the code works as documented.
> If you have 0 region servers that checked in and you are under the interval, you wait:
(true or true) = true.
> If you have 0 region servers but you are above the interval, you wait: (true or false)
= true.
> If you have 1 or more region servers that checked in and you are under the interval,
you wait: (false or true) = true.
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message