hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carlos Valiente (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-686) NullPointerException is thrown while merging edit log and image
Date Fri, 26 Mar 2010 09:22:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850048#action_12850048
] 

Carlos Valiente commented on HDFS-686:
--------------------------------------

I'm seeing this problem on a 4-node cluster running 0.21.0, revision 916530:

{noformat}
2010-03-25 19:57:33,538 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.lang.NullPointerException
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1399)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1387)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:672)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:401)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:368)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1172)
        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:594)
        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:476)
        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:353)
        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:317)
        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:219)
        at java.lang.Thread.run(Thread.java:619)
{noformat}

The content of one of my {{dfs.name.dir}} is available at [http://static.ecmwf.int/hadoop/dfs-2010-03-26-08-44.tar.bz2].

I've put a consolidated log file of all Hadoop processes at [http://static.ecmwf.int/hadoop/hadoop-2010-03-25.log.bz2].
The NPE happens at line 206027. When the NPE was thrown, there were serveral writes taking
place.

Not sure if it matters, but according to my Nagios monitor, one of the data nodes (web211,
136.156.30.211) run out of disk space at 2010-03-25 17:55:26 (about a minute before the edit
log roll started)


> NullPointerException is thrown while merging edit log and image
> ---------------------------------------------------------------
>
>                 Key: HDFS-686
>                 URL: https://issues.apache.org/jira/browse/HDFS-686
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.20.1, 0.21.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>            Priority: Blocker
>         Attachments: nullSetTime.patch, openNPE-trunk.patch, openNPE.patch
>
>
> Our secondary name node is not able to start on NullPointerException:
> ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.lang.NullPointerException
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1232)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1221)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:776)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992)
>         at
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:590)
>         at
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473)
>         at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350)
>         at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314)
>         at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225)
>         at java.lang.Thread.run(Thread.java:619)
> This was caused by setting access time on a non-existent file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message