Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7A81B200D29 for ; Thu, 12 Oct 2017 04:22:06 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 78E09160BE3; Thu, 12 Oct 2017 02:22:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id BF8C71609E5 for ; Thu, 12 Oct 2017 04:22:05 +0200 (CEST) Received: (qmail 56530 invoked by uid 500); 12 Oct 2017 02:22:04 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 56519 invoked by uid 99); 12 Oct 2017 02:22:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Oct 2017 02:22:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id CA0D1C3672 for ; Thu, 12 Oct 2017 02:22:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id q54E5_85GXMU for ; Thu, 12 Oct 2017 02:22:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id B1FAC5F5CC for ; Thu, 12 Oct 2017 02:22:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id E7023E0EFE for ; Thu, 12 Oct 2017 02:22:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 5425D253A1 for ; Thu, 12 Oct 2017 02:22:00 +0000 (UTC) Date: Thu, 12 Oct 2017 02:22:00 +0000 (UTC) From: "Jiandan Yang (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (HDFS-12638) NameNode exits due to ReplicationMonitor thread received Runtime exception in ReplicationWork#chooseTargets MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 12 Oct 2017 02:22:06 -0000 [ https://issues.apache.org/jira/browse/HDFS-12638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16201343#comment-16201343 ] Jiandan Yang edited comment on HDFS-12638 at 10/12/17 2:21 AM: ---------------------------------------------------------------- [~daryn] There is no snapshot directory in cluster, and we could not find block info in the log, and there are logs of truncate cmd in auditlog. In active NN WebUI there are 2000+ missing blocks, but fsck result do not include missing replicas. And crashed NN become standby successfully by restart. was (Author: yangjiandan): [~daryn] There is no snapshot directory in cluster, and we could not find block info in the log, and there are logs of truncate cmd in auditlog. In active NN WebUI there are 2000+ missing blocks, but fsck result do not include missing replicas. > NameNode exits due to ReplicationMonitor thread received Runtime exception in ReplicationWork#chooseTargets > ----------------------------------------------------------------------------------------------------------- > > Key: HDFS-12638 > URL: https://issues.apache.org/jira/browse/HDFS-12638 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs > Affects Versions: 2.8.2 > Reporter: Jiandan Yang > > Active NamNode exit due to NPE, I can confirm that the BlockCollection passed in when creating ReplicationWork is null, but I do not know why BlockCollection is null, By view history I found [HDFS-9754|https://issues.apache.org/jira/browse/HDFS-9754] remove judging whether BlockCollection is null. > NN logs are as following: > {code:java} > 2017-10-11 16:29:06,161 ERROR [ReplicationMonitor] org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: ReplicationMonitor thread received Runtime exception. > java.lang.NullPointerException > at org.apache.hadoop.hdfs.server.blockmanagement.ReplicationWork.chooseTargets(ReplicationWork.java:55) > at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1532) > at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1491) > at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3792) > at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3744) > at java.lang.Thread.run(Thread.java:834) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org