hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure
Date Thu, 16 Apr 2015 19:25:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498553#comment-14498553
] 

Colin Patrick McCabe commented on HDFS-8113:
--------------------------------------------

There are already a bunch of places in the code where we check whether BlockCollection is
null before doing something with it.  Example:
{code}
    if (block instanceof BlockInfoContiguous) {
      BlockCollection bc = ((BlockInfoContiguous) block).getBlockCollection();
      String fileName = (bc == null) ? "[orphaned]" : bc.getName();
      out.print(fileName + ": ");
    }
{code}

also:
{code}
  private int getReplication(Block block) {
    final BlockCollection bc = blocksMap.getBlockCollection(block);
    return bc == null? 0: bc.getBlockReplication();
  }
{code}

I think that the majority of cases already have a check.  My suggestion is just that we extend
this checking against null to all uses of the BlockInfoContiguous structure's block collection.

If the problem is too difficult to reproduce with a {{MiniDFSCluster}}, perhaps we can just
do a unit test of the copy constructor itself.

As I said earlier, I don't understand the rationale for keeping blocks with no associated
INode out of the BlocksMap.  It complicates the block report since it requires us to check
whether each block has an associated inode or not before adding it to the BlocksMap.  But
if that change seems too ambitious for this JIRA, we can deal with that later.

> NullPointerException in BlockInfoContiguous causes block report failure
> -----------------------------------------------------------------------
>
>                 Key: HDFS-8113
>                 URL: https://issues.apache.org/jira/browse/HDFS-8113
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.6.0
>            Reporter: Chengbing Liu
>            Assignee: Chengbing Liu
>         Attachments: HDFS-8113.patch
>
>
> The following copy constructor can throw NullPointerException if {{bc}} is null.
> {code}
>   protected BlockInfoContiguous(BlockInfoContiguous from) {
>     this(from, from.bc.getBlockReplication());
>     this.bc = from.bc;
>   }
> {code}
> We have observed that some DataNodes keeps failing doing block reports with NameNode.
The stacktrace is as follows. Though we are not using the latest version, the problem still
exists.
> {quote}
> 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: RemoteException
in offerService
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80)
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696)
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185)
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047)
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950)
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823)
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750)
> at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069)
> at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
> at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message