Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Date: Thu, 12 Oct 2017 02:22:00 +0000 (UTC)
From: "Jiandan Yang  (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.13108566.1507721344000.35478.1507774920341@Atlassian.JIRA>
In-Reply-To: <JIRA.13108566.1507721344000@Atlassian.JIRA>
References: <JIRA.13108566.1507721344000@Atlassian.JIRA> <JIRA.13108566.1507721344066@jira-lw-us.apache.org>
Subject: [jira] [Comment Edited] (HDFS-12638) NameNode exits due to
 ReplicationMonitor thread received Runtime exception in
 ReplicationWork#chooseTargets
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Thu, 12 Oct 2017 02:22:06 -0000


    [ https://issues.apache.org/jira/browse/HDFS-12638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16201343#comment-16201343 ] 

Jiandan Yang  edited comment on HDFS-12638 at 10/12/17 2:21 AM:
----------------------------------------------------------------

[~daryn] There is no snapshot directory in cluster, and we could not find block info in the log, and there are logs of truncate cmd in auditlog. In active NN  WebUI there are 2000+ missing blocks, but fsck result do not include missing replicas. And crashed NN become standby successfully by restart.


was (Author: yangjiandan):
[~daryn] There is no snapshot directory in cluster, and we could not find block info in the log, and there are logs of truncate cmd in auditlog. In active NN  WebUI there are 2000+ missing blocks, but fsck result do not include missing replicas.

> NameNode exits due to ReplicationMonitor thread received Runtime exception in ReplicationWork#chooseTargets
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-12638
>                 URL: https://issues.apache.org/jira/browse/HDFS-12638
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 2.8.2
>            Reporter: Jiandan Yang 
>
> Active NamNode exit due to NPE, I can confirm that the BlockCollection passed in when creating ReplicationWork is null, but I do not know why BlockCollection is null, By view history I found [HDFS-9754|https://issues.apache.org/jira/browse/HDFS-9754] remove judging  whether  BlockCollection is null.
> NN logs are as following:
> {code:java}
> 2017-10-11 16:29:06,161 ERROR [ReplicationMonitor] org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: ReplicationMonitor thread received Runtime exception.
> java.lang.NullPointerException
>         at org.apache.hadoop.hdfs.server.blockmanagement.ReplicationWork.chooseTargets(ReplicationWork.java:55)
>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1532)
>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1491)
>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3792)
>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3744)
>         at java.lang.Thread.run(Thread.java:834)
> {code}


--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org