hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9697) NN fails to restart due to corrupt fsimage caused by snapshot handling
Date Thu, 28 Jan 2016 22:49:40 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15122478#comment-15122478
] 

Jing Zhao commented on HDFS-9697:
---------------------------------

Looks like the NPE was caused by some recent changes on quota updating code. I.e., we've already
cleared the file's block list but still try to calculate its new quota so as to get the quota
usage change. It can be fixed by the following change:
{code}
--- a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java
@@ -501,7 +501,7 @@ public long getHeaderLong() {
   /** @return the blocks of the file. */
   @Override // BlockCollection
   public BlockInfo[] getBlocks() {
-    return this.blocks;
+    return this.blocks == null ? BlockInfo.EMPTY_ARRAY : this.blocks;
   }
{code}

Looks like the fsimage corruption was caused by the same bug from HDFS-9406. With the fix
the test case can pass.

> NN fails to restart due to corrupt fsimage caused by snapshot handling
> ----------------------------------------------------------------------
>
>                 Key: HDFS-9697
>                 URL: https://issues.apache.org/jira/browse/HDFS-9697
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>
> This is related to HDFS-9406, but not quite the same symptom.
> {quote}
> ERROR namenode.NameNode: Failed to start namenode.
> java.lang.NullPointerException
> 	at org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadINodeReference(FSImageFormatPBSnapshot.java:114)
> 	at org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadINodeReferenceSection(FSImageFormatPBSnapshot.java:105)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:258)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:180)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:226)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:929)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:913)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:732)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:668)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1062)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:766)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:589)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:646)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:818)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:797)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1493)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1561)
> {quote}
> A sequence that I found can reproduce the exception stack is:
> {code}
> hadoop fs -mkdir /st
> hadoop fs -mkdir /st/y
> hadoop fs -mkdir /nonst
> hadoop fs -mkdir /nonst/trash
> hdfs dfsadmin -allowSnapshot /st
> hdfs dfs -createSnapshot /st s0
> hadoop fs -touchz /st/y/nn.log
> hdfs dfs -createSnapshot /st s1
> hadoop fs -mv /st/y/nn.log /st/y/nn1.log
> hdfs dfs -createSnapshot /st s2
> hadoop fs -mkdir /nonst/trash/st
> hadoop fs -mv /st/y /nonst/trash/st
> hadoop fs -rmr /nonst/trash
> hdfs dfs -deleteSnapshot /st s1
> hdfs dfs -deleteSnapshot /st s2
> hdfs dfsadmin -safemode enter
> hdfs dfsadmin -saveNamespace
> hdfs dfsadmin -safemode leave
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message