hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Haohui Mai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6180) dead node count / listing is very broken in JMX and old GUI
Date Thu, 03 Apr 2014 19:09:21 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959097#comment-13959097
] 

Haohui Mai commented on HDFS-6180:
----------------------------------

The duplicated entry is added by the following code in the {{DatanodeManager}}

{code}
    if (listDeadNodes) {
      final EntrySet includedNodes = hostFileManager.getIncludes();
      final EntrySet excludedNodes = hostFileManager.getExcludes();
      for (Entry entry : includedNodes) {
        if ((foundNodes.find(entry) == null) &&
            (excludedNodes.find(entry) == null)) {
{code}

Note that {{entry}} does not contain the port as it comes from the include file, but all entries
in {{foundNode}} do. If passed in an entry without port, the {{find}} function should be able
to match it with the one with port information.

Internally {{find}} is implemented in {{TreeMap}}, which uses {{ip}} or {{ip:port}} as the
key. Since in lexically order the entry with port comes after the one without port, it implements
the port matching rule by checking whether the next entry has the same id. The problem is
that this heuristic is unreliable. It returns buggy results for the below examples:

{noformat}
172.18.146.3:1019
172.18.146.30:1019
{noformat}

Calling {{find(172.18.146.3)}} checks {{172.18.146.30:1019}} instead of {{172.18.146.3:1019}},
resulting the bug.

The bug can be quite confusing from the end user's prospective and I'd like to move forward
as quickly as possible.

[~kamrul], are you working on it? If not I can work on a patch later today.

> dead node count / listing is very broken in JMX and old GUI
> -----------------------------------------------------------
>
>                 Key: HDFS-6180
>                 URL: https://issues.apache.org/jira/browse/HDFS-6180
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.3.0
>            Reporter: Travis Thompson
>            Assignee: Haohui Mai
>         Attachments: dn.log
>
>
> After bringing up a 578 node cluster with 13 dead nodes, 0 were reported on the new GUI,
but showed up properly in the datanodes tab.  Some nodes are also being double reported in
the deadnode and inservice section (22 show up dead, 565 show up alive, 9 duplicated nodes).

> From /jmx (confirmed that it's the same in jconsole):
> {noformat}
> {
>     "name" : "Hadoop:service=NameNode,name=FSNamesystemState",
>     "modelerType" : "org.apache.hadoop.hdfs.server.namenode.FSNamesystem",
>     "CapacityTotal" : 5477748687372288,
>     "CapacityUsed" : 24825720407,
>     "CapacityRemaining" : 5477723861651881,
>     "TotalLoad" : 565,
>     "SnapshotStats" : "{\"SnapshottableDirectories\":0,\"Snapshots\":0}",
>     "BlocksTotal" : 21065,
>     "MaxObjects" : 0,
>     "FilesTotal" : 25454,
>     "PendingReplicationBlocks" : 0,
>     "UnderReplicatedBlocks" : 0,
>     "ScheduledReplicationBlocks" : 0,
>     "FSState" : "Operational",
>     "NumLiveDataNodes" : 565,
>     "NumDeadDataNodes" : 0,
>     "NumDecomLiveDataNodes" : 0,
>     "NumDecomDeadDataNodes" : 0,
>     "NumDecommissioningDataNodes" : 0,
>     "NumStaleDataNodes" : 1
>   },
> {noformat}
> I'm not going to include deadnode/livenodes because the list is huge, but I've confirmed
there are 9 nodes showing up in both deadnodes and livenodes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message