hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7991) Allow users to skip checkpoint when stopping NameNode
Date Thu, 21 May 2015 18:23:17 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554800#comment-14554800
] 

Jing Zhao commented on HDFS-7991:
---------------------------------

Thanks Allen. Yes, I also just realized that jmx may not be a good solution here.

bq. to do a REST or RPC call to ask the NN what it's doing
The same question here is what if this RPC/REST call fails (or timeout)? Should we retry and
how? Or should we kill the NameNode? To me this is not fundamentally different from the "saveNamespace"
solution:
# We're using kill to trigger the shutdown hook which does the checkpoint. This can be mapped
to the step sending out a saveNamespace command to NN.
# We then keep polling the state of the NameNode using a REST/RPC call, just like waiting
for the response from the saveNamespace RPC.
# Both solutions finally need to answer the same question: what if the REST/RPC call fails?

bq. This will almost certainly break init.d/rc.d/service/launchd/whatever scripts.
Yes, but I think if the checkpoint is necessary at this time, breaking these scripts may not
be that bad compared with killing the namenode then waiting hours for the namenode to load
edits or even fixing corrupted edits.

bq. currently does not require a Kerberos credential
Regarding to the auth part, how about directly parsing the hdfs-site.xml and getting the namenode
fsimage/edits directory location? Then we can directly check if the checkpoint is necessary
by going through the fsimage/edits file names.

> Allow users to skip checkpoint when stopping NameNode
> -----------------------------------------------------
>
>                 Key: HDFS-7991
>                 URL: https://issues.apache.org/jira/browse/HDFS-7991
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>         Attachments: HDFS-7991.000.patch, HDFS-7991.001.patch, HDFS-7991.002.patch, HDFS-7991.003.patch,
HDFS-7991.004.patch
>
>
> This is a follow-up jira of HDFS-6353. HDFS-6353 adds the functionality to check if saving
namespace is necessary before stopping namenode. As [~kihwal] pointed out in this [comment|https://issues.apache.org/jira/browse/HDFS-6353?focusedCommentId=14380898&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14380898],
in a secured cluster this new functionality requires the user to be kinit'ed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message