hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3886) Shutdown requests can possibly check for checkpoint issues (corrupted edits) and save a good namespace copy before closing down?
Date Sun, 02 Sep 2012 11:05:09 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446916#comment-13446916
] 

Steve Loughran commented on HDFS-3886:
--------------------------------------

I don't think you could easily do much with init.d as that is initiated by the OS when it's
doing a shutdown and it may be unrolling large parts of the system: fast shutdowns are always
appreciated before the monitoring layers escalate. Same for Linux clustering resource agents:
the slower the shutdown, the longer it takes to migrate a service to a new node in the HA
cluster.

Perhaps a way could be provided over RPC to tell the NN to block & checkpoint; dfsAdmin
could be the gateway to this. If you could do this without even stopping the process, you
have something you can test more easily and a better ops experience: you just issue a {{hadoop
dfsadmin --checkpoint}} command, your NN goes into safe mode briefly, the logs are sorted
out and things continue. 
                
> Shutdown requests can possibly check for checkpoint issues (corrupted edits) and save
a good namespace copy before closing down?
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-3886
>                 URL: https://issues.apache.org/jira/browse/HDFS-3886
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 2.0.0-alpha
>            Reporter: Harsh J
>            Priority: Minor
>
> HDFS-3878 sorta gives me this idea. Aside of having a method to download it to a different
location, we can also lock up the namesystem (or deactivate the client rpc server) and save
the namesystem before we complete up the shutdown.
> The init.d/shutdown scripts would have to work with this somehow though, to not kill
-9 it when in-process. Also, the new image may be stored in a shutdown.chkpt directory, to
not interfere in the regular dirs, but still allow easier recovery.
> Obviously this will still not work if all directories are broken. So maybe we could have
some configs to tackle that as well?
> I haven't thought this through, so let me know what part is wrong to do :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message