hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-8150) the code that handles RAITE on master in 0.94 should not always use the same plan
Date Wed, 20 Mar 2013 00:45:16 GMT
Sergey Shelukhin created HBASE-8150:
---------------------------------------

             Summary: the code that handles RAITE on master in 0.94 should not always use
the same plan
                 Key: HBASE-8150
                 URL: https://issues.apache.org/jira/browse/HBASE-8150
             Project: HBase
          Issue Type: Bug
            Reporter: Sergey Shelukhin
            Priority: Minor


The code in 0.94 AM sets the region plan to point to the same server when retrying the assignment
due to RAITE.
{code}
LOG.warn("Failed assignment of "
            + state.getRegion().getRegionNameAsString()
            + " to "
            + plan.getDestination()
            + ", trying to assign "
            + (regionAlreadyInTransitionException ? "to the same region server"
                + " because of RegionAlreadyInTransitionException;" : "elsewhere instead;
")
            + "retry=" + i, t);
{code}

However, there's no wait time in the loop that retries the assignment, and if region is being
marked failed to open, which may take some time, master can easily exhaust retries in less
than half a second (unfortunately I no longer have logs) and region will be stuck.

Do you think this is worth fixing (for example, by not using the same server here after a
few retries, or by adding timed backoff in such cases)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message