hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5037) Active NN should trigger its own edit log rolls
Date Thu, 31 Oct 2013 22:37:19 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13810825#comment-13810825
] 

Aaron T. Myers commented on HDFS-5037:
--------------------------------------

Patch looks pretty good to me, thanks a lot for taking this up.

Two comments:

# How did you decide on the default number of edits being 20MM? That seems like it might be
a little high to me. Suggest we consider instead making it a multiplier of the configured
number of edits that the standby/secondary would use to roll if it were working properly,
and default this value to 2 or 3.
# In the event of error, we would currently exit the NN edit log roller thread. Perhaps we
should instead move the try/catch inside of the run loop, and only exit the thread in the
case of InterruptedException?

> Active NN should trigger its own edit log rolls
> -----------------------------------------------
>
>                 Key: HDFS-5037
>                 URL: https://issues.apache.org/jira/browse/HDFS-5037
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: ha, namenode
>    Affects Versions: 3.0.0, 2.1.0-beta
>            Reporter: Todd Lipcon
>            Assignee: Andrew Wang
>            Priority: Critical
>         Attachments: hdfs-5037-1.patch, hdfs-5037-2.patch
>
>
> We've seen cases where the SBN/2NN went down, and then users accumulated very very large
edit log segments. This causes a slow startup time because the last edit log segment must
be read fully to recover it before the NN can start up again. Additionally, in the case of
QJM, it can trigger timeouts on recovery or edit log syncing because the very-large segment
has to get processed within a certain time bound.
> We could easily improve this by having the NN trigger its own edit log rolls on a configurable
size (eg every 256MB)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message