hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chengbing Liu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure
Date Fri, 10 Apr 2015 16:01:13 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14489849#comment-14489849
] 

Chengbing Liu commented on HDFS-8113:
-------------------------------------

Hi Aaron, I have no idea where the corrupt replica come from. From the log I see that the
problem had been there for at lease one month. A few DataNodes have the same problem. Their
block reports to NameNode will always throw the described exception. After the exception,
they wait one second before attempting another block report, and so forth.

However, the HDFS as a whole still works. The only bad thing is that the RPC processing time
in average is around 10ms, which is too high compared to our other clusters. We expect this
to be around 1ms.

Does that answer your question?

> NullPointerException in BlockInfoContiguous causes block report failure
> -----------------------------------------------------------------------
>
>                 Key: HDFS-8113
>                 URL: https://issues.apache.org/jira/browse/HDFS-8113
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.6.0
>            Reporter: Chengbing Liu
>            Assignee: Chengbing Liu
>         Attachments: HDFS-8113.patch
>
>
> The following copy constructor can throw NullPointerException if {{bc}} is null.
> {code}
>   protected BlockInfoContiguous(BlockInfoContiguous from) {
>     this(from, from.bc.getBlockReplication());
>     this.bc = from.bc;
>   }
> {code}
> We have observed that some DataNodes keeps failing doing block reports with NameNode.
The stacktrace is as follows. Though we are not using the latest version, the problem still
exists.
> {quote}
> 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: RemoteException
in offerService
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80)
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696)
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185)
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047)
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950)
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823)
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750)
> at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069)
> at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
> at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message