Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7E0F811FAD for ; Fri, 4 Apr 2014 04:44:20 +0000 (UTC) Received: (qmail 8718 invoked by uid 500); 4 Apr 2014 04:44:19 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 8368 invoked by uid 500); 4 Apr 2014 04:44:17 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 8321 invoked by uid 99); 4 Apr 2014 04:44:14 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Apr 2014 04:44:14 +0000 Date: Fri, 4 Apr 2014 04:44:14 +0000 (UTC) From: "Haohui Mai (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-6180) dead node count / listing is very broken in JMX and old GUI MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959627#comment-13959627 ] Haohui Mai commented on HDFS-6180: ---------------------------------- The v0 patch makes the following changes: # It canonicalizes all entries in the include / exclude lists into ip addresses. It ignores entries that the DNS fails to resolve. This is okay because by default the NN refuses to register the DNs of which it fails to resolve their ips and the hostnames. (see {{dfs.namenode.datanode.registration.ip-hostname-check}} for more details) The patch maintains the old behavior which only performs DNS lookups when loading the lists, so that the DNS overhead is minimized. # The patch continues to assume that an entry without ports in the list matches all endpoints (i.e., addr:port) that match the IP of the entry. This defines a partial order between endpoints that have the same IPs. More concretely, if two endpoints A and B have the same IPs, then A <= B iff a.port == b.port or b.port == 0. That way checking the include and the exclude list becomes finding the meet or the join elements in the lattice. > dead node count / listing is very broken in JMX and old GUI > ----------------------------------------------------------- > > Key: HDFS-6180 > URL: https://issues.apache.org/jira/browse/HDFS-6180 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.3.0 > Reporter: Travis Thompson > Assignee: Haohui Mai > Priority: Blocker > Attachments: HDFS-6180.000.patch, dn.log > > > After bringing up a 578 node cluster with 13 dead nodes, 0 were reported on the new GUI, but showed up properly in the datanodes tab. Some nodes are also being double reported in the deadnode and inservice section (22 show up dead, 565 show up alive, 9 duplicated nodes). > From /jmx (confirmed that it's the same in jconsole): > {noformat} > { > "name" : "Hadoop:service=NameNode,name=FSNamesystemState", > "modelerType" : "org.apache.hadoop.hdfs.server.namenode.FSNamesystem", > "CapacityTotal" : 5477748687372288, > "CapacityUsed" : 24825720407, > "CapacityRemaining" : 5477723861651881, > "TotalLoad" : 565, > "SnapshotStats" : "{\"SnapshottableDirectories\":0,\"Snapshots\":0}", > "BlocksTotal" : 21065, > "MaxObjects" : 0, > "FilesTotal" : 25454, > "PendingReplicationBlocks" : 0, > "UnderReplicatedBlocks" : 0, > "ScheduledReplicationBlocks" : 0, > "FSState" : "Operational", > "NumLiveDataNodes" : 565, > "NumDeadDataNodes" : 0, > "NumDecomLiveDataNodes" : 0, > "NumDecomDeadDataNodes" : 0, > "NumDecommissioningDataNodes" : 0, > "NumStaleDataNodes" : 1 > }, > {noformat} > I'm not going to include deadnode/livenodes because the list is huge, but I've confirmed there are 9 nodes showing up in both deadnodes and livenodes. -- This message was sent by Atlassian JIRA (v6.2#6252)