hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allan Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-18132) Low replication should be checked in period in case of datanode rolling upgrade
Date Wed, 31 May 2017 01:35:05 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030481#comment-16030481

Allan Yang commented on HBASE-18132:

How is the default value of 30 seconds determined ?
It doesn't matter,  the only requirement is that the interval of checking low replication
is smaller than the interval of restarting datanodes. In our case, we set the restart interval
of DN at rolling start to 1 min. So we set the check interval to 30 seconds.
Thanks for your advice, [~tedyu]. I will modify the patch and upload a master patch later

> Low replication should be checked in period in case of datanode rolling upgrade
> -------------------------------------------------------------------------------
>                 Key: HBASE-18132
>                 URL: https://issues.apache.org/jira/browse/HBASE-18132
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.4.0, 1.1.10
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>         Attachments: HBASE-18132-branch-1.patch
> For now, we just check low replication of WALs when there is a sync operation (HBASE-2234),
rolling the log if the replica of the WAL is less than configured. But if the WAL has very
little writes or no writes at all, low replication will not be detected and thus no log will
be rolled. 
> That is a problem when rolling updating datanode, all replica of the WAL with no writes
will be restarted and lead to the WAL file end up with a abnormal state. Later operation of
opening this file will be always failed.
> I bring up a patch to check low replication of WALs at a configured period. When rolling
updating datanodes, we just make sure the restart interval time between two nodes is bigger
than the low replication check time, the WAL will be closed and rolled normally. A UT in the
patch will show everything.

This message was sent by Atlassian JIRA

View raw message