hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravi Prakash (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-2177) Restarting the namenode when the secondary namenode is checkpointing seems to remove everything from /
Date Mon, 25 Jul 2011 21:32:15 GMT

     [ https://issues.apache.org/jira/browse/HDFS-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ravi Prakash updated HDFS-2177:
-------------------------------

    Attachment: test_HDFS-2177.sh

The version of HDFS running on the 10 node clusters was remotes/origin/MR-279. I wasn't able
to reproduce this issue on trunk (which was running on my single node cluster). On MR-279
( e3d9a2bcbcab817043b1c4c41efb7036ce00904f ) its pretty easy. I'm attaching a test you can
use to replicate the behavior on a single node cluster. I'm not going to work on this any
longer (investigate why its happening) because this issue wasn't on trunk (already been fixed?).

To run the test:
1. Please set your checkpoint period (dfs.namenode.checkpoint.period) to 10 seconds. 
2. The idea is to make the NN shutdown at exactly the time the SNN is doing a checkpoint.
On my machine the fillHDFS function takes exactly the time to cause that. You might have to
adjust the sleep times while looking at the NN and SNN logs to replicate this.
3. The test fills up /tmp to create a big edits file. I'm not sure if that is necessary.

> Restarting the namenode when the secondary namenode is checkpointing seems to remove
everything from /
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-2177
>                 URL: https://issues.apache.org/jira/browse/HDFS-2177
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.23.0
>            Reporter: Ravi Prakash
>            Assignee: Ravi Prakash
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: test_HDFS-2177.sh
>
>
> This was again discovered by Arpit Gupta! Restarting the namenode when the secondary
namenode is checkpointing seems to remove everything from /

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message