hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Himanshu Vashishtha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7937) Retry log rolling to support HA NN scenario
Date Thu, 28 Feb 2013 01:40:11 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13589034#comment-13589034

Himanshu Vashishtha commented on HBASE-7937:

Thanks for taking a look.

bq. + private int logRollRetryCount;
Yes, they are set in ctr; I will make them final, and make them default in case the value
is <=0.

bq. // there may be a case when fs has just become available; one can do one more retry
I was considering the case when a NN HA recovers in b/w we failed while doing an op, and checking
via FSUtils#checkFSAvailable call. If that happens, it will be in a state for eg: fs.rename()
threw an exception, but fs is healthy... so rethrow the exception to the caller. In actual,
it should have done one more retry.
I tried to cover that case with the fsOk variable. If you think this is not needed, I will
remove it.

bq. incrementing twice. 
Sorry about that. I will fix this.

bq. Default pause time:
1 sec; as defined in HConstants#DEFAULT_HBASE_SERVER_PAUSE

bq. Are we holding up all writes when we are paused like this?
I don't think we are. We are in the retrying loop at two places here:
a) Creating a new log writer
b) Archiving old logs
As long as we haven't created a new writer, we don't change the old log writer. So, we are
still pointing to the old hlog.
Archiving old logs shouldn't be a blocking call. If it is, it is a bug.

bq. refactoring..
Will do.

TestHLogSplit passes on local. I didn't change the LogSplitter code. Tried to keep its scope
> Retry log rolling to support HA NN scenario
> -------------------------------------------
>                 Key: HBASE-7937
>                 URL: https://issues.apache.org/jira/browse/HBASE-7937
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 0.94.5
>            Reporter: Himanshu Vashishtha
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.95.0
>         Attachments: HBASE-7937-trunk.patch, HBASE-7937-v1.patch
> A failure in log rolling causes regionserver abort. In case of HA NN, it will be good
if there is a retry mechanism to roll the logs.
> A corresponding jira for MemStore retries is HBASE-7507.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message