hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7991) Allow users to skip checkpoint when stopping NameNode
Date Fri, 22 May 2015 17:45:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556496#comment-14556496

Allen Wittenauer commented on HDFS-7991:

bq. Ideally when 2NN or standby is working. But we have had many issues where checkpointing
is not done by SNN or standby, for the following reasons:

OK, so these are not new issues at all and have been around for literally years (decade now?).
We had it happen at Y! back in 2007 and it's a story I often tell during talks. 

bq. We need a way to be able to save namespace. 

Then fix the NN<->2NN relationship to provide better alerting when stuff goes wrong.
 Hacking the shell code (and, yes, the code in branch-2 and in trunk are clearly hacks.  Heck,
the branch-2 doesn't even trigger if you are running NN in non-daemon mode...) is completely
the wrong thing to do.

.. and has been pointed out, this hack does NOTHING to help in the case of hardware failure,
when you want it most.

bq. Today operators who understand this situation do save namespace manually before stopping
the namenode.

I don't think I can put enough lol's in here to express how many laughs this statement got
from around the office. No, operators who understand this issue monitor the size of the edits
file and the 2NN and then act appropriately.  We don't do safemode->checkpoint->shutdown
on every NN bring down.

> Allow users to skip checkpoint when stopping NameNode
> -----------------------------------------------------
>                 Key: HDFS-7991
>                 URL: https://issues.apache.org/jira/browse/HDFS-7991
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>         Attachments: HDFS-7991-shellpart.patch, HDFS-7991.000.patch, HDFS-7991.001.patch,
HDFS-7991.002.patch, HDFS-7991.003.patch, HDFS-7991.004.patch
> This is a follow-up jira of HDFS-6353. HDFS-6353 adds the functionality to check if saving
namespace is necessary before stopping namenode. As [~kihwal] pointed out in this [comment|https://issues.apache.org/jira/browse/HDFS-6353?focusedCommentId=14380898&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14380898],
in a secured cluster this new functionality requires the user to be kinit'ed.

This message was sent by Atlassian JIRA

View raw message