hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Collins (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3990) NN's health report has severe performance problems
Date Wed, 17 Oct 2012 19:22:06 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478271#comment-13478271

Eli Collins commented on HDFS-3990:

Think the approach in the latest patch should work. Once HDFS-4068 you can rebase on it and
remove all the cleanup.

- We can remove the dnAddress check for null now that it looks like NNThroughputBenchmark
always uses RPC 
- Rename getNodeNames something more explicit like getNodeNamesForHostFiltering?
- Rather than have updateNodeAddr let's use the two setters explicitly, easier to follow the
registration behavior (ie we explicitly clobber the ip and peer hostname). Hopefully we'll
eventually be able to make DatanodeID immutable so we don't update it in place.
- Let's update getNodeNames to include the DN hostname since that is the current behavior,
and file a separate jira for removing the use of the DN reported hostname here (or perhaps
removing the reported DN field entirely)
- Let's update hashCode in a separate change. I think this will need some additional changes
like modifying Host2NodesMap to use DataNodeID hashCode, it currently explicitly uses the
IP addr for the hash and ignores DatanodeID#hashCode.
- Add a javadoc to testDNSLookups indicating that we're testing that the NN does *not* do
DN lookups after registration 
- Nit, I'd create the SM inline via "System.setSecurityManager(new SecurityManager() {" so
it's clear it's only associated with this DNS tests (like TestDFSShell for eg)
- Nit, rename "lookups" in the test to "initialLookups"
> NN's health report has severe performance problems
> --------------------------------------------------
>                 Key: HDFS-3990
>                 URL: https://issues.apache.org/jira/browse/HDFS-3990
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>         Attachments: HDFS-3990.patch, HDFS-3990.patch, HDFS-3990.patch, HDFS-3990.patch,
HDFS-3990.patch, hdfs-3990.txt, hdfs-3990.txt
> The dfshealth page will place a read lock on the namespace while it does a dns lookup
for every DN.  On a multi-thousand node cluster, this often results in 10s+ load time for
the health page.  10 concurrent requests were found to cause 7m+ load times during which time
write operations blocked.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message