hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3694) if MiniDFS startup time could be improved, testing time would be reduced
Date Wed, 09 Jul 2008 12:11:32 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612019#action_12612019
] 

Steve Loughran commented on HADOOP-3694:
----------------------------------------

Assuming this is the cause (and that same stack trace comes up, again and again), this is
what the code is trying to do

DataNode.startDataNode()

    
    InetSocketAddress ipcAddr = NetUtils.createSocketAddr(     // 1
        conf.get("dfs.datanode.ipc.address"));
    String hostname = ipcAddr.getHostName();                         // 2
    ipcServer = RPC.getServer(this, hostname, ipcAddr.getPort(),  // 3
        conf.getInt("dfs.datanode.handler.count", 3), false, conf);

(1) get socket address from the dfs.datanode.ipc.address, which defaults to "0.0.0.0:50020"
(2) get the real hostname of the assigned socket
(3) open a server on this port. 


Inside NetUtils.createSocketAddr, the configuration string is parsed and the (hostname,port)
values extracted. This hostname is then turned into a new address. 

1. If there is a static hostname -> hostname' mapping that is used

    if (getStaticResolution(hostname) != null) {
      hostname = getStaticResolution(hostname);
    }

2. else the OS/JVM does the work, to work out the address
    return new InetSocketAddress(hostname, port);

Somehow this is picking up an IPv6 address

Later, when ipcAddr.getHostName(); is called (in (2)), An attempt to rDNS this address is
made. Unless your site is running IPv6 DNS, this isnt going to succeed, but you are going
to take a 15-30s hit every time an attempt is made.

I'm going to see how to remove IPv6 from this machine, which has 1 real and two virtual interfaces
as well as loopback, to see if this will make the problem go away, or at least make some mild
improvements....

eth0      Link encap:Ethernet  HWaddr 00:1C:C4:17:CC:46  
          inet addr:16.XX.XX.XXX Bcast:16.XX.XX.255  Mask:255.255.252.0
          inet6 addr: fe80::21c:c4ff:fe17:cc46/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1561368 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12199689 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:325311436 (310.2 MB)  TX bytes:2108807940 (1.9 GB)
          Interrupt:17 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:2538947 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2538947 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:881374392 (840.5 MB)  TX bytes:881374392 (840.5 MB)

vmnet1    Link encap:Ethernet  HWaddr 00:50:56:C0:00:01  
          inet addr:192.168.66.1  Bcast:192.168.66.255  Mask:255.255.255.0
          inet6 addr: fe80::250:56ff:fec0:1/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1151 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

vmnet8    Link encap:Ethernet  HWaddr 00:50:56:C0:00:08  
          inet addr:192.168.142.1  Bcast:192.168.142.255  Mask:255.255.255.0
          inet6 addr: fe80::250:56ff:fec0:8/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1151 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)





> if MiniDFS startup time could be improved, testing time would be reduced
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3694
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3694
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: test
>    Affects Versions: 0.19.0
>            Reporter: Steve Loughran
>
> Its taking me 140 minutes to run a test build; looking into the test results its the
20s startup delay of every MiniDFS cluster that is slowing things down. If we could find out
why it is taking so long and cut it down, every test case that relied on a cluster would be
speeded up. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message