hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Travis Thompson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6180) dead node count / listing is very broken in JMX and old GUI
Date Wed, 02 Apr 2014 01:18:16 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13957224#comment-13957224
] 

Travis Thompson commented on HDFS-6180:
---------------------------------------

After looking at it for a while, it seems like a bigger issue than initially thought.  The
last digit of the last octet being truncated.

Example:
|| LIVE NODE || LIVE & DEAD NODES |
| datanode5464 (xxx.xxx.138.244) | datanode5391 (xxx.xxx.138.24) |
| datanode5486 (xxx.xxx.138.222) | datanode5392 (xxx.xxx.138.22) |
| datanode5477 (xxx.xxx.138.233) | datanode5393 (xxx.xxx.138.23) |
| datanode5601 (xxx.xxx.139.244) | datanode5526 (xxx.xxx.139.24) |

Starting {{datanode5464}} will force {{datanode5391}} to become both live and dead if it's
running.  The order in which the pair up doesn't matter, the same effect will happen in the
end.

> dead node count / listing is very broken in JMX and old GUI
> -----------------------------------------------------------
>
>                 Key: HDFS-6180
>                 URL: https://issues.apache.org/jira/browse/HDFS-6180
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.3.0
>            Reporter: Travis Thompson
>         Attachments: dn.log
>
>
> After bringing up a 578 node cluster with 13 dead nodes, 0 were reported on the new GUI,
but showed up properly in the datanodes tab.  Some nodes are also being double reported in
the deadnode and inservice section (22 show up dead, 565 show up alive, 9 duplicated nodes).

> From /jmx (confirmed that it's the same in jconsole):
> {noformat}
> {
>     "name" : "Hadoop:service=NameNode,name=FSNamesystemState",
>     "modelerType" : "org.apache.hadoop.hdfs.server.namenode.FSNamesystem",
>     "CapacityTotal" : 5477748687372288,
>     "CapacityUsed" : 24825720407,
>     "CapacityRemaining" : 5477723861651881,
>     "TotalLoad" : 565,
>     "SnapshotStats" : "{\"SnapshottableDirectories\":0,\"Snapshots\":0}",
>     "BlocksTotal" : 21065,
>     "MaxObjects" : 0,
>     "FilesTotal" : 25454,
>     "PendingReplicationBlocks" : 0,
>     "UnderReplicatedBlocks" : 0,
>     "ScheduledReplicationBlocks" : 0,
>     "FSState" : "Operational",
>     "NumLiveDataNodes" : 565,
>     "NumDeadDataNodes" : 0,
>     "NumDecomLiveDataNodes" : 0,
>     "NumDecomDeadDataNodes" : 0,
>     "NumDecommissioningDataNodes" : 0,
>     "NumStaleDataNodes" : 1
>   },
> {noformat}
> I'm not going to include deadnode/livenodes because the list is huge, but I've confirmed
there are 9 nodes showing up in both deadnodes and livenodes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message