hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3990) NN's health report has severe performance problems
Date Mon, 03 Dec 2012 17:16:01 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508869#comment-13508869
] 

Chris Nauroth commented on HDFS-3990:
-------------------------------------

Daryn and Eli, we merged this change to branch-trunk-win on Friday, 11/30.  Unfortunately,
this had an unintended side effect of breaking on Windows, at least for single-node developer
setups, because of the code change to reject registration of an unresolved data node:

{code}
  public void registerDatanode(DatanodeRegistration nodeReg)
      throws DisallowedDatanodeException {
    InetAddress dnAddress = Server.getRemoteIp();
    if (dnAddress != null) {
      // Mostly called inside an RPC, update ip and peer hostname
      String hostname = dnAddress.getHostName();
      String ip = dnAddress.getHostAddress();
      if (hostname.equals(ip)) {
        LOG.warn("Unresolved datanode registration from " + ip);
        throw new DisallowedDatanodeException(nodeReg);
      }
{code}

On Windows, 127.0.0.1 does not resolve to localhost.  It reports host name as "127.0.0.1".
 Therefore, on Windows, running pseudo-distributed mode or MiniDFSCluster-based tests always
rejects the datanode registrations.  (See HADOOP-8414 for more discussion of the particulars
of resolving 127.0.0.1 on Windows.)

Potential fixes I can think of:

# Add special case logic to allow registration if ip.equals("127.0.0.1").  This is the quick
fix I applied to my dev environment to unblock myself last Friday.
# Add a check against NetUtils.getStaticResolution and register it with NetUtils.addStaticResolution("127.0.0.1",
"localhost") somewhere at initialization time.

Do you have an opinion on the best way to fix it?  I have a Windows VM ready to go, so I can
code the patch and test.

                
> NN's health report has severe performance problems
> --------------------------------------------------
>
>                 Key: HDFS-3990
>                 URL: https://issues.apache.org/jira/browse/HDFS-3990
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>             Fix For: 3.0.0, 2.0.3-alpha, 0.23.5
>
>         Attachments: HDFS-3990.branch-0.23.patch, HDFS-3990.branch-0.23.patch, HDFS-3990.patch,
HDFS-3990.patch, HDFS-3990.patch, HDFS-3990.patch, HDFS-3990.patch, HDFS-3990.patch, HDFS-3990.patch,
HDFS-3990.patch, hdfs-3990.txt, hdfs-3990.txt
>
>
> The dfshealth page will place a read lock on the namespace while it does a dns lookup
for every DN.  On a multi-thousand node cluster, this often results in 10s+ load time for
the health page.  10 concurrent requests were found to cause 7m+ load times during which time
write operations blocked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message