hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6563) NameNode cannot save fsimage in certain circumstances when snapshots are in use
Date Thu, 19 Jun 2014 00:09:25 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14036702#comment-14036702
] 

Aaron T. Myers commented on HDFS-6563:
--------------------------------------

I've filed this as critical for now, but if folks think this should be a blocker I'm fine
raising the priority.

Though the issue is fairly critical, the bug is fairly straightforward. In {{FSImageFormatPBINode#save(OutputStream,
INodeFile)}} we have the following code:

{code}
        for (Block block : n.getBlocks()) {
          b.addBlocks(PBHelper.convert(block));
        }
{code}

Perhaps not obviously, this assumes that {{n.getBlocks()}} will never return {{null}}. However,
this is possible in the above-described scenario because of this code in {{FileWithSnapshotFeature#collectBlocksBeyondMax}}:

{code}
        final BlockInfo[] newBlocks;
        if (n == 0) {
          newBlocks = null;
        } else {
          newBlocks = new BlockInfo[n];
          System.arraycopy(oldBlocks, 0, newBlocks, 0, n);
        }
        
        // set new blocks
        file.setBlocks(newBlocks);
{code}

When attempting to save an fsimage after this code has been run, errors like the following
will appear in the logs:

{noformat}
2014-06-18 16:55:11,295 ERROR namenode.FSImage (FSImage.java:run(988)) - Unable to save image
for /home/atm/src/apache/hadoop.git/src/hadoop-hdfs-project/hadoop-hdfs/build/test/data/dfs/name1
java.lang.NullPointerException
	at org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Saver.save(FSImageFormatPBINode.java:537)
	at org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Saver.save(FSImageFormatPBINode.java:518)
	at org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Saver.serializeINodeSection(FSImageFormatPBINode.java:491)
	at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Saver.saveInodes(FSImageFormatProtobuf.java:412)
	at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Saver.saveInternal(FSImageFormatProtobuf.java:457)
	at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Saver.save(FSImageFormatProtobuf.java:393)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:931)
	at org.apache.hadoop.hdfs.server.namenode.FSImage$FSImageSaver.run(FSImage.java:982)
	at java.lang.Thread.run(Thread.java:724)
2014-06-18 16:55:11,295 ERROR namenode.FSImage (FSImage.java:run(988)) - Unable to save image
for /home/atm/src/apache/hadoop.git/src/hadoop-hdfs-project/hadoop-hdfs/build/test/data/dfs/name2
java.lang.NullPointerException
	at org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Saver.save(FSImageFormatPBINode.java:537)
	at org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Saver.save(FSImageFormatPBINode.java:518)
	at org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Saver.serializeINodeSection(FSImageFormatPBINode.java:491)
	at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Saver.saveInodes(FSImageFormatProtobuf.java:412)
	at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Saver.saveInternal(FSImageFormatProtobuf.java:457)
	at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Saver.save(FSImageFormatProtobuf.java:393)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:931)
	at org.apache.hadoop.hdfs.server.namenode.FSImage$FSImageSaver.run(FSImage.java:982)
	at java.lang.Thread.run(Thread.java:724)
2014-06-18 16:55:11,297 ERROR common.Storage (NNStorage.java:reportErrorsOnDirectory(808))
- Error reported on storage directory Storage Directory /home/atm/src/apache/hadoop.git/src/hadoop-hdfs-project/hadoop-hdfs/build/test/data/dfs/name1
2014-06-18 16:55:11,297 WARN  common.Storage (NNStorage.java:reportErrorsOnDirectory(813))
- About to remove corresponding storage: /home/atm/src/apache/hadoop.git/src/hadoop-hdfs-project/hadoop-hdfs/build/test/data/dfs/name1
2014-06-18 16:55:11,297 ERROR common.Storage (NNStorage.java:reportErrorsOnDirectory(808))
- Error reported on storage directory Storage Directory /home/atm/src/apache/hadoop.git/src/hadoop-hdfs-project/hadoop-hdfs/build/test/data/dfs/name2
2014-06-18 16:55:11,297 WARN  common.Storage (NNStorage.java:reportErrorsOnDirectory(813))
- About to remove corresponding storage: /home/atm/src/apache/hadoop.git/src/hadoop-hdfs-project/hadoop-hdfs/build/test/data/dfs/name2
{noformat}

> NameNode cannot save fsimage in certain circumstances when snapshots are in use
> -------------------------------------------------------------------------------
>
>                 Key: HDFS-6563
>                 URL: https://issues.apache.org/jira/browse/HDFS-6563
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode, snapshots
>    Affects Versions: 2.4.0
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>            Priority: Critical
>
> Checkpoints will start to fail and the NameNode will not be able to manually saveNamespace
if the following set of steps occurs:
> # A zero-length file appears in a snapshot
> # That file is later lengthened to include at least one block
> # That file is subsequently deleted from the present file system but remains in the snapshot
> More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message