hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ekanth Sethuramalingam (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-14317) Standby does not trigger edit log rolling when in-progress edit log tailing is enabled
Date Fri, 01 Mar 2019 05:57:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16781318#comment-16781318

Ekanth Sethuramalingam commented on HDFS-14317:

{quote}That's not true, they both accept time units. You can do something like "10ms" and
it will parse it properly. This is only in Hadoop 3+
Thanks [~xkrogen] for pointing this out. However, as I looked deeper in the code, the value
is converted to seconds which will lose the precision.
    logRollPeriodMs = conf.getTimeDuration(
        DFSConfigKeys.DFS_HA_LOGROLL_PERIOD_DEFAULT, TimeUnit.SECONDS) * 1000;
    sleepTimeMs = conf.getTimeDuration(
I guess I'll leave it as it is. I'll upload a new patch in a bit.

> Standby does not trigger edit log rolling when in-progress edit log tailing is enabled
> --------------------------------------------------------------------------------------
>                 Key: HDFS-14317
>                 URL: https://issues.apache.org/jira/browse/HDFS-14317
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.9.0, 3.0.0
>            Reporter: Ekanth Sethuramalingam
>            Assignee: Ekanth Sethuramalingam
>            Priority: Critical
>         Attachments: HDFS-14317.001.patch
> The standby uses the following method to check if it is time to trigger edit log rolling
on active.
> {code}
>   /**
>    * @return true if the configured log roll period has elapsed.
>    */
>   private boolean tooLongSinceLastLoad() {
>     return logRollPeriodMs >= 0 && 
>       (monotonicNow() - lastLoadTimeMs) > logRollPeriodMs ;
>   }
> {code}
> In doTailEdits(), lastLoadTimeMs is updated when standby is able to successfully tail
any edits
> {code}
>       if (editsLoaded > 0) {
>         lastLoadTimeMs = monotonicNow();
>       }
> {code}
> The default configuration for {{dfs.ha.log-roll.period}} is 120 seconds and {{dfs.ha.tail-edits.period}}
is 60 seconds. With in-progress edit log tailing enabled, tooLongSinceLastLoad() will almost
never return true resulting in edit logs not rolled for a long time until this configuration
{{dfs.namenode.edit.log.autoroll.multiplier.threshold}} takes effect.
> [In our deployment, this resulted in in-progress edit logs getting deleted. The sequence
of events is that standby was able to checkpoint twice while the in-progress edit log was
growing on active. When the NNStorageRetentionManager decided to cleanup old checkpoints and
edit logs, it cleaned up the in-progress edit log from active and QJM (as the txnid on in-progress
edit log was older than the 2 most recent checkpoints) resulting in irrecoverably losing
a few minutes worth of metadata].

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message