hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "star (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint
Date Sat, 20 Apr 2019 00:56:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822296#comment-16822296

star commented on HDFS-14378:

Failed test pass locally.

> Simplify the design of multiple NN and both logic of edit log roll and checkpoint
> ---------------------------------------------------------------------------------
>                 Key: HDFS-14378
>                 URL: https://issues.apache.org/jira/browse/HDFS-14378
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 3.1.2
>            Reporter: star
>            Assignee: star
>            Priority: Minor
>              Labels: patch
>         Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch, HDFS-14378-trunk.003.patch,
HDFS-14378-trunk.004.patch, HDFS-14378-trunk.005.patch, HDFS-14378-trunk.006.patch
>       HDFS-6440 introduced a mechanism to support more than 2 NNs. It implements a
first-writer-win policy to avoid duplicated fsimage downloading. Variable 'isPrimaryCheckPointer'
is used to hold the first-writer state, with which SNN will provide fsimage for ANN next time.
Then we have three roles in NN cluster: ANN, one primary SNN, one or more normal SNN.
>       Since HDFS-12248, there may be more than two primary SNN shortly after a exception
occurred. It takes care with a scenario  that SNN will not upload fsimage on IOE and Interrupted
exceptions. Though it will not cause any further functional issues, it is inconsistent. 
>       Futher more, edit log may be rolled more frequently than necessary with multiple
Standby name nodes, HDFS-14349. (I'm not so sure about this, will verify by unit tests or
any one could point it out.)
>       Above all, I‘m wondering if we could make it simple with following changes:
>  * There are only two roles:ANN, SNN
>  * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period.
>  * ANN will select a SNN to download checkpoint.
> SNN will just do logtail and checkpoint. Then provide a servlet for fsimage downloading
as normal. SNN will not try to roll edit log or send checkpoint request to ANN.
> In a word, ANN will be more active. Suggestions are welcomed.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message