hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravi Prakash (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave
Date Tue, 21 May 2013 20:45:21 GMT

     [ https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ravi Prakash updated HDFS-4832:
-------------------------------

    Description: 
Courtesy Karri VRK Reddy!
{quote}
1. Namenode lost datanodes causing missing blocks
2. Namenode was put in safe mode
3. Datanode restarted on dead nodes 
4. Waited for lots of time for the NN UI to reflect the recovered blocks.
5. Forced NN out of safe mode and suddenly,  no more missing blocks anymore.
{quote}

I was able to replicate this on 0.23 and trunk. I set dfs.namenode.heartbeat.recheck-interval
to 1 and killed the DN to simulate "lost" datanode. The opposite case also has problems (i.e.
Datanode failing when NN is in safemode, doesn't lead to a missing blocks message)

Without the NN updating this list of missing blocks, the grid admins will not know when to
take the cluster out of safemode.

  was:
Courtesy Karri VRK Reddy!
{quote}
1. Namenode lost datanodes causing missing blocks
2. Namenode was put in safe mode
3. Datanode restarted on dead nodes 
4. Waited for lots of time for the NN UI to reflect the recovered blocks.
5. Forced NN out of safe mode and suddenly,  no more missing blocks anymore.
{quote}

I was able to replicate this on 0.23 and trunk. I set dfs.namenode.heartbeat.recheck-interval
to 1 and killed the DN to simulate "lost" datanode.

Without the NN updating this list of missing blocks, the grid admins will not know when to
take the cluster out of safemode.

    
> Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave
> -----------------------------------------------------------------------------------------
>
>                 Key: HDFS-4832
>                 URL: https://issues.apache.org/jira/browse/HDFS-4832
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
>            Reporter: Ravi Prakash
>            Assignee: Ravi Prakash
>            Priority: Critical
>         Attachments: HDFS-4832.patch
>
>
> Courtesy Karri VRK Reddy!
> {quote}
> 1. Namenode lost datanodes causing missing blocks
> 2. Namenode was put in safe mode
> 3. Datanode restarted on dead nodes 
> 4. Waited for lots of time for the NN UI to reflect the recovered blocks.
> 5. Forced NN out of safe mode and suddenly,  no more missing blocks anymore.
> {quote}
> I was able to replicate this on 0.23 and trunk. I set dfs.namenode.heartbeat.recheck-interval
to 1 and killed the DN to simulate "lost" datanode. The opposite case also has problems (i.e.
Datanode failing when NN is in safemode, doesn't lead to a missing blocks message)
> Without the NN updating this list of missing blocks, the grid admins will not know when
to take the cluster out of safemode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message