hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-3776) NPE in NameNode with unknown blocks
Date Wed, 23 Jul 2008 01:01:34 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-3776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Raghu Angadi updated HADOOP-3776:

    Attachment: HADOOP-3776.patch

The patch essentially reverts the hunk "@@ -2780,17 +2751,8 @@" from the patch for HADOOP-3002.
It moves the check to the beginning of addStoredBlock(). 

The removal of this check was ok in the case of processReport() but not in the case of blockReceived().

> NPE in NameNode with unknown blocks
> -----------------------------------
>                 Key: HADOOP-3776
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3776
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>            Priority: Blocker
>             Fix For: 0.18.0
>         Attachments: HADOOP-3776.patch
> When a datanode has a block that NameNode does not have, it results in an NPE at the
NameNode. And one of these cases results in an infinite loop of these errors because DataNode
keeps invoking the same RPC that resulted in this NPE.
> One way to reproduce :
>  * On a single DN cluster, start writing a large file (something like {{'bin/hadoop fs
-put 5Gb 5Gb'}})
>  * Now, from a different shell, delete this file ({{bin/hadoop fs -rm 5Gb}})
>  * Most likely you will hit this.
>  * The cause is that when DataNode invokes {{blockReceived()}} to inform about the last
block it received, the file is already deleted and results in an NPE at the namenode. The
way DataNode works, it basically keep invoking the same RPC with same block and results in
the same error.
> When block does not exist in NameNode's blocksMap, it basically does not belong to the
cluster. Let me know if you need the trace. Basically the NPE is at FSNamesystem.java:2800
(on trunk).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message