hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-686) NullPointerException is thrown while merging edit log and image
Date Thu, 08 Oct 2009 21:03:31 GMT

    [ https://issues.apache.org/jira/browse/HDFS-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763696#action_12763696

Hairong Kuang commented on HDFS-686:

The edit log showed that the file that caused the problem was moved to the trash right before
it was open for read and thus reset its access time. But how would open succeed? It turned
out to be a synchronization problem. NameNode fetches the inode of the file to be open without
holding fsnamesystem lock.

This was approximately what happened. Moving the file to trash and open requests arrived at
NN simultaneously. So NN
1. fetched the inode for the file;
2. moved the file to trash;
3. opened the file and set its access time.

Although 2 removed the file's inode, 3 succeeded because it accessed the file through the
inode fetched in 1.

> NullPointerException is thrown while merging edit log and image
> ---------------------------------------------------------------
>                 Key: HDFS-686
>                 URL: https://issues.apache.org/jira/browse/HDFS-686
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.20.1
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.20.2
> Our secondary name node is not able to start on NullPointerException:
> ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.lang.NullPointerException
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1232)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1221)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:776)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992)
>         at
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:590)
>         at
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473)
>         at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350)
>         at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314)
>         at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225)
>         at java.lang.Thread.run(Thread.java:619)
> This was caused by setting access time on a non-existent file.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message