hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ekanth Sethuramalingam (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-14317) Standby does not trigger edit log rolling when in-progress edit log tailing is enabled
Date Fri, 01 Mar 2019 22:02:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782142#comment-16782142
] 

Ekanth Sethuramalingam commented on HDFS-14317:
-----------------------------------------------

New patch [^HDFS-14317.004.patch] fixes the {{hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits}}
test failures.

> Standby does not trigger edit log rolling when in-progress edit log tailing is enabled
> --------------------------------------------------------------------------------------
>
>                 Key: HDFS-14317
>                 URL: https://issues.apache.org/jira/browse/HDFS-14317
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.9.0, 3.0.0
>            Reporter: Ekanth Sethuramalingam
>            Assignee: Ekanth Sethuramalingam
>            Priority: Critical
>         Attachments: HDFS-14317.001.patch, HDFS-14317.002.patch, HDFS-14317.003.patch,
HDFS-14317.004.patch
>
>
> The standby uses the following method to check if it is time to trigger edit log rolling
on active.
> {code}
>   /**
>    * @return true if the configured log roll period has elapsed.
>    */
>   private boolean tooLongSinceLastLoad() {
>     return logRollPeriodMs >= 0 && 
>       (monotonicNow() - lastLoadTimeMs) > logRollPeriodMs ;
>   }
> {code}
> In doTailEdits(), lastLoadTimeMs is updated when standby is able to successfully tail
any edits
> {code}
>       if (editsLoaded > 0) {
>         lastLoadTimeMs = monotonicNow();
>       }
> {code}
> The default configuration for {{dfs.ha.log-roll.period}} is 120 seconds and {{dfs.ha.tail-edits.period}}
is 60 seconds. With in-progress edit log tailing enabled, tooLongSinceLastLoad() will almost
never return true resulting in edit logs not rolled for a long time until this configuration
{{dfs.namenode.edit.log.autoroll.multiplier.threshold}} takes effect.
> [In our deployment, this resulted in in-progress edit logs getting deleted. The sequence
of events is that standby was able to checkpoint twice while the in-progress edit log was
growing on active. When the NNStorageRetentionManager decided to cleanup old checkpoints and
edit logs, it cleaned up the in-progress edit log from active and QJM (as the txnid on in-progress
edit log was older than the 2 most recent checkpoints) resulting in irrecoverably losing
a few minutes worth of metadata].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message