hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jiandan Yang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-12638) NameNode exits due to ReplicationMonitor thread received Runtime exception in ReplicationWork#chooseTargets
Date Fri, 13 Oct 2017 09:49:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16201540#comment-16201540
] 

Jiandan Yang  edited comment on HDFS-12638 at 10/13/17 9:48 AM:
----------------------------------------------------------------

datanode revover failed because new blocksize is Long.MAX
{code:java}
2017-10-09 19:19:17,054 INFO [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@437346ab]
org.apache.hadoop.hdfs.server.datanode.DataNode: NameNode at nn_hostname/xx.xxx.xx.xxx:8020
calls recoverBlock(BP-1721125339-xx.xxx.xx.xxx-1505883414013:blk_1084203820_11907141, targets=[DatanodeInfoWithStorage[xx.xxx.xx.aaa:50010,null,null],
DatanodeInfoWithStorage[xx.xxx.xx.bbb:50010,null,null], DatanodeInfoWithStorage[xx.xxx.xx.ccc:50010,null,null]],
newGenerationStamp=11907145, newBlock=blk_1084203824_11907145)
2017-10-09 19:19:17,055 INFO [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@437346ab]
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: initReplicaRecovery:
blk_1084203820_11907141, recoveryId=11907145, replica=FinalizedReplica, blk_1084203820_11907141,
FINALIZED
  getNumBytes()     = 7
  getBytesOnDisk()  = 7
  getVisibleLength()= 7
  getVolume()       = /dump/10/dfs/data/current
  getBlockFile()    = /dump/10/dfs/data/current/BP-1721125339-xx.xxx.xx.xxx-1505883414013/current/finalized/subdir31/subdir3/blk_1084203820
2017-10-09 19:19:17,055 INFO [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@437346ab]
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: initReplicaRecovery:
changing replica state for blk_1084203820_11907141 from FINALIZED to RUR
2017-10-09 19:19:17,058 WARN [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@437346ab]
org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to updateBlock (newblock=BP-1721125339-xx.xxx.xx.xxx-1505883414013:blk_1084203824_11907145,
datanode=DatanodeInfoWithStorage[xx.xxx.xx.aaa:50010,null,null])
org.apache.hadoop.ipc.RemoteException(java.io.IOException): rur.getNumBytes() < newlength
= 9223372036854775807, rur=ReplicaUnderRecovery, blk_1084203820_11907141, RUR
  getNumBytes()     = 7
  getBytesOnDisk()  = 7
  getVisibleLength()= 7
  getVolume()       = /dump/9/dfs/data/current
  getBlockFile()    = /dump/9/dfs/data/current/BP-1721125339-xx.xxx.xx.xxx-1505883414013/current/finalized/subdir31/subdir3/blk_1084203820
  recoveryId=11907145
  original=FinalizedReplica, blk_1084203820_11907141, FINALIZED
  getNumBytes()     = 7
  getBytesOnDisk()  = 7
  getVisibleLength()= 7
  getVolume()       = /dump/9/dfs/data/current
  getBlockFile()    = /dump/9/dfs/data/current/BP-1721125339-xx.xxx.xx.xxx-1505883414013/current/finalized/subdir31/subdir3/blk_1084203820
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.updateReplicaUnderRecovery(FsDatasetImpl.java:2736)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.updateReplicaUnderRecovery(FsDatasetImpl.java:2678)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.updateReplicaUnderRecovery(DataNode.java:2776)
        at org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolServerSideTranslatorPB.updateReplicaUnderRecovery(InterDatanodeProtocolServerSideTranslatorPB.java:78)
        at org.apache.hadoop.hdfs.protocol.proto.InterDatanodeProtocolProtos$InterDatanodeProtocolService$2.callBlockingMethod(InterDatanodeProtocolProtos.java:3107)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:846)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:789)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1804)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2457)


        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483)
        at org.apache.hadoop.ipc.Client.call(Client.java:1429)
        at org.apache.hadoop.ipc.Client.call(Client.java:1339)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
        at com.sun.proxy.$Proxy22.updateReplicaUnderRecovery(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolTranslatorPB.updateReplicaUnderRecovery(InterDatanodeProtocolTranslatorPB.java:112)
        at org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$BlockRecord.updateReplicaUnderRecovery(BlockRecoveryWorker.java:77)
        at org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$BlockRecord.access$600(BlockRecoveryWorker.java:60)
        at org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.syncBlock(BlockRecoveryWorker.java:283)
        at org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.recover(BlockRecoveryWorker.java:175)
        at org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1.run(BlockRecoveryWorker.java:382)
        at java.lang.Thread.run(Thread.java:834)
....
....
....
2017-10-09 19:19:17,060 WARN [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@437346ab]
org.apache.hadoop.hdfs.server.datanode.DataNode: recoverBlocks FAILED: RecoveringBlock{BP-1721125339-xx.xxx.xx.xxx-1505883414013:blk_1084203820_11907141;
getBlockSize()=7; corrupt=false; offset=-1; locs=[DatanodeInfoWithStorage[xx.xxx.xx.aaa:50010,null,null],
DatanodeInfoWithStorage[xx.xxx.xx.bbb:50010,null,null], DatanodeInfoWithStorage[xx.xxx.xx.ccc:50010,null,null]]}
java.io.IOException: Cannot recover BP-1721125339-xx.xxx.xx.xxx-1505883414013:blk_1084203820_11907141,
the following 3 data-nodes failed {
  DatanodeInfoWithStorage[xx.xxx.xx.aaa:50010,null,null]
  DatanodeInfoWithStorage[xx.xxx.xx.bbb:50010,null,null]
  DatanodeInfoWithStorage[xx.xxx.xx.ccc:50010,null,null]
}
{code}


was (Author: yangjiandan):
datanode revover failed because new blocksize is Long.MAX
{code:java}
2017-10-09 19:19:17,054 INFO [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@437346ab]
org.apache.hadoop.hdfs.server.datanode.DataNode: NameNode at et2btsm1.et2.tbsite.net/11.251.159.136:8020
calls recoverBlock(BP-1721125339-xx.xxx.xx.xxx-1505883414013:blk_1084203820_11907141, targets=[DatanodeInfoWithStorage[xx.xxx.xx.aaa:50010,null,null],
DatanodeInfoWithStorage[xx.xxx.xx.bbb:50010,null,null], DatanodeInfoWithStorage[xx.xxx.xx.ccc:50010,null,null]],
newGenerationStamp=11907145, newBlock=blk_1084203824_11907145)
2017-10-09 19:19:17,055 INFO [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@437346ab]
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: initReplicaRecovery:
blk_1084203820_11907141, recoveryId=11907145, replica=FinalizedReplica, blk_1084203820_11907141,
FINALIZED
  getNumBytes()     = 7
  getBytesOnDisk()  = 7
  getVisibleLength()= 7
  getVolume()       = /dump/10/dfs/data/current
  getBlockFile()    = /dump/10/dfs/data/current/BP-1721125339-xx.xxx.xx.xxx-1505883414013/current/finalized/subdir31/subdir3/blk_1084203820
2017-10-09 19:19:17,055 INFO [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@437346ab]
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: initReplicaRecovery:
changing replica state for blk_1084203820_11907141 from FINALIZED to RUR
2017-10-09 19:19:17,058 WARN [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@437346ab]
org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to updateBlock (newblock=BP-1721125339-xx.xxx.xx.xxx-1505883414013:blk_1084203824_11907145,
datanode=DatanodeInfoWithStorage[xx.xxx.xx.aaa:50010,null,null])
org.apache.hadoop.ipc.RemoteException(java.io.IOException): rur.getNumBytes() < newlength
= 9223372036854775807, rur=ReplicaUnderRecovery, blk_1084203820_11907141, RUR
  getNumBytes()     = 7
  getBytesOnDisk()  = 7
  getVisibleLength()= 7
  getVolume()       = /dump/9/dfs/data/current
  getBlockFile()    = /dump/9/dfs/data/current/BP-1721125339-xx.xxx.xx.xxx-1505883414013/current/finalized/subdir31/subdir3/blk_1084203820
  recoveryId=11907145
  original=FinalizedReplica, blk_1084203820_11907141, FINALIZED
  getNumBytes()     = 7
  getBytesOnDisk()  = 7
  getVisibleLength()= 7
  getVolume()       = /dump/9/dfs/data/current
  getBlockFile()    = /dump/9/dfs/data/current/BP-1721125339-xx.xxx.xx.xxx-1505883414013/current/finalized/subdir31/subdir3/blk_1084203820
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.updateReplicaUnderRecovery(FsDatasetImpl.java:2736)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.updateReplicaUnderRecovery(FsDatasetImpl.java:2678)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.updateReplicaUnderRecovery(DataNode.java:2776)
        at org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolServerSideTranslatorPB.updateReplicaUnderRecovery(InterDatanodeProtocolServerSideTranslatorPB.java:78)
        at org.apache.hadoop.hdfs.protocol.proto.InterDatanodeProtocolProtos$InterDatanodeProtocolService$2.callBlockingMethod(InterDatanodeProtocolProtos.java:3107)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:846)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:789)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1804)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2457)


        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483)
        at org.apache.hadoop.ipc.Client.call(Client.java:1429)
        at org.apache.hadoop.ipc.Client.call(Client.java:1339)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
        at com.sun.proxy.$Proxy22.updateReplicaUnderRecovery(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolTranslatorPB.updateReplicaUnderRecovery(InterDatanodeProtocolTranslatorPB.java:112)
        at org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$BlockRecord.updateReplicaUnderRecovery(BlockRecoveryWorker.java:77)
        at org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$BlockRecord.access$600(BlockRecoveryWorker.java:60)
        at org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.syncBlock(BlockRecoveryWorker.java:283)
        at org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.recover(BlockRecoveryWorker.java:175)
        at org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1.run(BlockRecoveryWorker.java:382)
        at java.lang.Thread.run(Thread.java:834)
....
....
....
2017-10-09 19:19:17,060 WARN [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@437346ab]
org.apache.hadoop.hdfs.server.datanode.DataNode: recoverBlocks FAILED: RecoveringBlock{BP-1721125339-xx.xxx.xx.xxx-1505883414013:blk_1084203820_11907141;
getBlockSize()=7; corrupt=false; offset=-1; locs=[DatanodeInfoWithStorage[xx.xxx.xx.aaa:50010,null,null],
DatanodeInfoWithStorage[xx.xxx.xx.bbb:50010,null,null], DatanodeInfoWithStorage[xx.xxx.xx.ccc:50010,null,null]]}
java.io.IOException: Cannot recover BP-1721125339-xx.xxx.xx.xxx-1505883414013:blk_1084203820_11907141,
the following 3 data-nodes failed {
  DatanodeInfoWithStorage[xx.xxx.xx.aaa:50010,null,null]
  DatanodeInfoWithStorage[xx.xxx.xx.bbb:50010,null,null]
  DatanodeInfoWithStorage[xx.xxx.xx.ccc:50010,null,null]
}
{code}

> NameNode exits due to ReplicationMonitor thread received Runtime exception in ReplicationWork#chooseTargets
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-12638
>                 URL: https://issues.apache.org/jira/browse/HDFS-12638
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 2.8.2
>            Reporter: Jiandan Yang 
>
> Active NamNode exit due to NPE, I can confirm that the BlockCollection passed in when
creating ReplicationWork is null, but I do not know why BlockCollection is null, By view history
I found [HDFS-9754|https://issues.apache.org/jira/browse/HDFS-9754] remove judging  whether
 BlockCollection is null.
> NN logs are as following:
> {code:java}
> 2017-10-11 16:29:06,161 ERROR [ReplicationMonitor] org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
ReplicationMonitor thread received Runtime exception.
> java.lang.NullPointerException
>         at org.apache.hadoop.hdfs.server.blockmanagement.ReplicationWork.chooseTargets(ReplicationWork.java:55)
>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1532)
>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1491)
>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3792)
>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3744)
>         at java.lang.Thread.run(Thread.java:834)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message