hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3990) NN's health report has severe performance problems
Date Mon, 03 Dec 2012 17:16:01 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508869#comment-13508869

Chris Nauroth commented on HDFS-3990:

Daryn and Eli, we merged this change to branch-trunk-win on Friday, 11/30.  Unfortunately,
this had an unintended side effect of breaking on Windows, at least for single-node developer
setups, because of the code change to reject registration of an unresolved data node:

  public void registerDatanode(DatanodeRegistration nodeReg)
      throws DisallowedDatanodeException {
    InetAddress dnAddress = Server.getRemoteIp();
    if (dnAddress != null) {
      // Mostly called inside an RPC, update ip and peer hostname
      String hostname = dnAddress.getHostName();
      String ip = dnAddress.getHostAddress();
      if (hostname.equals(ip)) {
        LOG.warn("Unresolved datanode registration from " + ip);
        throw new DisallowedDatanodeException(nodeReg);

On Windows, does not resolve to localhost.  It reports host name as "".
 Therefore, on Windows, running pseudo-distributed mode or MiniDFSCluster-based tests always
rejects the datanode registrations.  (See HADOOP-8414 for more discussion of the particulars
of resolving on Windows.)

Potential fixes I can think of:

# Add special case logic to allow registration if ip.equals("").  This is the quick
fix I applied to my dev environment to unblock myself last Friday.
# Add a check against NetUtils.getStaticResolution and register it with NetUtils.addStaticResolution("",
"localhost") somewhere at initialization time.

Do you have an opinion on the best way to fix it?  I have a Windows VM ready to go, so I can
code the patch and test.

> NN's health report has severe performance problems
> --------------------------------------------------
>                 Key: HDFS-3990
>                 URL: https://issues.apache.org/jira/browse/HDFS-3990
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>             Fix For: 3.0.0, 2.0.3-alpha, 0.23.5
>         Attachments: HDFS-3990.branch-0.23.patch, HDFS-3990.branch-0.23.patch, HDFS-3990.patch,
HDFS-3990.patch, HDFS-3990.patch, HDFS-3990.patch, HDFS-3990.patch, HDFS-3990.patch, HDFS-3990.patch,
HDFS-3990.patch, hdfs-3990.txt, hdfs-3990.txt
> The dfshealth page will place a read lock on the namespace while it does a dns lookup
for every DN.  On a multi-thousand node cluster, this often results in 10s+ load time for
the health page.  10 concurrent requests were found to cause 7m+ load times during which time
write operations blocked.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message