hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "chunhui shen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8150) the code that handles RAITE on master in 0.94 should not always use the same plan
Date Wed, 20 Mar 2013 04:15:23 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13607261#comment-13607261
] 

chunhui shen commented on HBASE-8150:
-------------------------------------

Trunk has already done as the above
{code}
if (Boolean.TRUE.equals(previous)) {
          // An open is in progress. This is supported, but let's log this.
          LOG.info("Receiving OPEN for the region:" +
              region.getRegionNameAsString() + " , which we are already trying to OPEN" +
              " - ignoring this new request for this region.");
        }
{code}

in 0.94 branch, we could also ignore the RegionAlreadyInTransitionException , not throw it
to master
                
> the code that handles RAITE on master in 0.94 should not always use the same plan
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-8150
>                 URL: https://issues.apache.org/jira/browse/HBASE-8150
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Priority: Minor
>
> The code in 0.94 AM sets the region plan to point to the same server when retrying the
assignment due to RAITE.
> {code}
> LOG.warn("Failed assignment of "
>             + state.getRegion().getRegionNameAsString()
>             + " to "
>             + plan.getDestination()
>             + ", trying to assign "
>             + (regionAlreadyInTransitionException ? "to the same region server"
>                 + " because of RegionAlreadyInTransitionException;" : "elsewhere instead;
")
>             + "retry=" + i, t);
> {code}
> However, there's no wait time in the loop that retries the assignment, and if region
is being marked failed to open, which may take some time, master can easily exhaust retries
in less than half a second, going to the same server every time and getting the same exception
(unfortunately I no longer have logs); then the region will be stuck.
> Do you think this is worth fixing (for example, by not using the same server here after
a few retries, or by adding timed backoff in such cases)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message