hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-12638) NameNode exits due to ReplicationMonitor thread received Runtime exception in ReplicationWork#chooseTargets
Date Wed, 11 Oct 2017 14:27:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200361#comment-16200361
] 

Kihwal Lee commented on HDFS-12638:
-----------------------------------

We have also seen blocks with "null" bc staying in the replication queue. They were missing
blocks so the replication monitor didn't even try to schedule them and didn't crash. But metaSave
was listing them as orphaned (bc == null, deleted).  Other than failing over (force queue
reinitialization),there was no way to clear them.

In your particular case, we can add a null check in {{scheduleReplication()}} in addition
to the existing deletion check.  The missing block case is a bit trickier, since the replication
monitor will not touch them and nothing will move it to a different priority level since the
block is already deleted and invalidated on datanodes.  We should prevent it from getting
added to the queue.

In any case, it is apparent that the new {{isDeleted()}} check cannot replace the bc null
check 100%.  [~jingzhao] any thoughts?


> NameNode exits due to ReplicationMonitor thread received Runtime exception in ReplicationWork#chooseTargets
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-12638
>                 URL: https://issues.apache.org/jira/browse/HDFS-12638
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 2.8.2
>            Reporter: Jiandan Yang 
>
> Active NamNode exit due to NPE, I can confirm that the BlockCollection passed in when
creating ReplicationWork is null, but I do not know why BlockCollection is null, By view history
I found [HDFS-9754|https://issues.apache.org/jira/browse/HDFS-9754] remove judging  whether
 BlockCollection is null.
> NN logs are as following:
> {code:java}
> 2017-10-11 16:29:06,161 ERROR [ReplicationMonitor] org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
ReplicationMonitor thread received Runtime exception.
> java.lang.NullPointerException
>         at org.apache.hadoop.hdfs.server.blockmanagement.ReplicationWork.chooseTargets(ReplicationWork.java:55)
>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1532)
>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1491)
>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3792)
>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3744)
>         at java.lang.Thread.run(Thread.java:834)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message