hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16209) Provide an ExponentialBackOffPolicy sleep between failed region open requests
Date Tue, 02 Aug 2016 17:12:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15404407#comment-15404407

Joseph commented on HBASE-16209:

Oh sorry, I think a lot of these changes are in response to HBASE-16138, where we are working
under the assumption that many region opens will be failed until the Replication Table regions
are up, which could take a bit of time. The sleep would just allow us more control over how
long we would retry opening a region and not flooding the RegionServer with requests.
In terms of the error, I think closedRegionHandler is sometimes used for closing/reassigning
regions that were not failed_open, because of that they would not have a failed_open counter,
so when we tried to call get() on the failed_open counter inside of invokeAssignLaterOnFailure()
we got an NPE that would prevent us from handling closed regions and leading to timed out
tests. I think the default initial and max sleep period is also set to 0 ms, so I don't think
it should slow down the tests that much? I ran a few of the failed tests on my laptop and
they passed, but I am still waiting on the Unit Tests. Do you have any comments/suggestions?

> Provide an ExponentialBackOffPolicy sleep between failed region open requests
> -----------------------------------------------------------------------------
>                 Key: HBASE-16209
>                 URL: https://issues.apache.org/jira/browse/HBASE-16209
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Joseph
>            Assignee: Joseph
>             Fix For: 2.0.0, 1.4.0
>         Attachments: HBASE-16209-addendum.patch, HBASE-16209-branch-1-addendum-v2.patch,
HBASE-16209-branch-1-addendum.patch, HBASE-16209-branch-1.patch, HBASE-16209-v2.patch, HBASE-16209.patch
> Related to HBASE-16138. As of now we currently have no pause between retrying failed
region open requests. And with a low maximumAttempt default, we can quickly use up all our
regionOpen retries if the server is in a bad state. I added in a ExponentialBackOffPolicy
so that we spread out the timing of our open region retries in AssignmentManager. Review board
at https://reviews.apache.org/r/50011/

This message was sent by Atlassian JIRA

View raw message