hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7991) Allow users to skip checkpoint when stopping NameNode
Date Thu, 21 May 2015 23:49:17 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555300#comment-14555300
] 

Allen Wittenauer commented on HDFS-7991:
----------------------------------------

bq. Then if we also let this java program send out the checkpoint check command, and considering
our current RPC already has the capability to handle timeout and retry, I guess we can directly
utilize the current saveNamespace RPC?

I would keep it simple:  shutdown also triggers the logic for if checkpoint is necessary.
 There's zero value in "waiting" for the helper app to trigger it. This also means the helper
app is extremely simple:  an unauthenticated call that does "is checkpoint still happening?
Is checkpoint still happening? What about now? Are we down yet Papa Smurf?"  This way we also
fix [~sureshms] issue:

bq. Blindly sending kill -9 is not an option in my opinion. 

That's why it's not blind.  The helper app's *sole* purpose should be to provide the hint
to the shell code if things are so screwed up that kill -9 is the only way out.  This way
all of the key, important logic is in Java code and the one thing the Java code probably shouldn't
do (kill) is left to the shell code.

bq. Instead of emphasizing namenode stop functionality works, I would rather see save namespace
work.

To the person who isn't looking at the code, these are effectively one and the same. If I'm
stopping the namenode, I expect it to do what is necessary to come back up in a sane state.
 Why should an admin have to make the decision here when the NN itself knows the state best?
 Telling me to run save namespace is dumb:  "Why didn't you just do it yourself, you stupid
program?" :D

bq.  Isn't there an environment variable that enables this functionality? For folks who want
stop to not save namespace or a different behavior, it can be be used to go back to the previous
behavior, right?

The # of times this is going to be needed should approach zero... and in those cases, a Java
property (or properties!) is *way* better.  Some clueless person is going to tell others "Hey,
set this to make your system shut down faster."  The Java apps can read the properties do
whatever it needed/desired.  This also means they can prompt to say "are you sure?" because
this is the type of operation (shutdown w/out checkpoint) that sounds like should never happen
in an automated way.

> Allow users to skip checkpoint when stopping NameNode
> -----------------------------------------------------
>
>                 Key: HDFS-7991
>                 URL: https://issues.apache.org/jira/browse/HDFS-7991
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>         Attachments: HDFS-7991.000.patch, HDFS-7991.001.patch, HDFS-7991.002.patch, HDFS-7991.003.patch,
HDFS-7991.004.patch
>
>
> This is a follow-up jira of HDFS-6353. HDFS-6353 adds the functionality to check if saving
namespace is necessary before stopping namenode. As [~kihwal] pointed out in this [comment|https://issues.apache.org/jira/browse/HDFS-6353?focusedCommentId=14380898&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14380898],
in a secured cluster this new functionality requires the user to be kinit'ed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message