hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Rathke <n...@sci.utah.edu>
Subject Re: Running Hadoop on cluster with NFS booted systems
Date Tue, 29 Sep 2009 02:21:45 GMT
FYI I get the same hanging behavior if I follow the Hadoop quick start 
for a single node base line configuration ( no modified conf files)

-Nick


Brian Bockelman wrote:
> Hey Nicke,
>
> Do you have any error messages appearing in the log files?
>
> Brian
>
> On Sep 28, 2009, at 2:06 PM, Nick Rathke wrote:
>
>> Ted Dunning wrote:
>>> I think that the last time you asked this question, the suggestion 
>>> was to
>>> look at DNS and make sure that everything is exactly correct in the 
>>> net-boot
>>> configuration.  Hadoop is very sensitive to network routing and naming
>>> details.
>>>
>>> So,
>>>
>>> a) in your net-boot, how are IP addresses assigned?
>>>
>> We assign static IP's based on a node's MAC address via DHCP so that 
>> when a node is netbooted or booted with a local OS it gets the same 
>> IP and hostname.
>>> b) how are DNS names propagated?
>>>
>> cluster DNS names are on a mixed in with our facility DNS servers.
>> All nodes have proper forward and reverse DNS lookups.
>>> c) how have you guaranteed that (a) and (b) are exactly consistent?
>>>
>> Host MAC address. I also have manually conformed this.
>>> d) how have your guaranteed that every node can talk to every other 
>>> node
>>> both by name and IP address?
>>>
>> Local cluster DNS / DHCP + all nodes have all other nodes host names 
>> and IP's in /etc/hosts. I have compared all the config files for DNS 
>> / DHCP / and /etc/hosts to make sure all information is the same.
>>> e) have you assured yourself that any reverse mapping that exists is
>>> correct?
>>>
>> Yes, and tested.
>>
>> One more bit of information. The system boots on a 1Gb network all 
>> other network traffic i.e. MPI and NFS to data volumes is via IB.
>>
>> The IB network also has proper forward/backwards DNS entries. IB IP 
>> address are setup at boot time via a script that takes the host IP 
>> and a fixed offset to calculate the address for the IB interface. I 
>> have also confirmed that the IB IP address's match our DNS .
>>
>> -Nick
>>
>>
>>> On Mon, Sep 28, 2009 at 9:45 AM, Nick Rathke <nick@sci.utah.edu> wrote:
>>>
>>>
>>>> I am hopping that someone can help with this issue. I have a 64 node
>>>> cluster that we would like to run Hadoop on, most of the nodes are 
>>>> netbooted
>>>> via NFS.
>>>>
>>>> Hadoop runs fine on nodes IF the node uses a local OS install, but 
>>>> doesn't
>>>> work when nodes are netbooted. Under netboot I can see that the 
>>>> slaves have
>>>> the correct Java processes running, but the Hadoop web pages never 
>>>> shows the
>>>> nodes as available. The Hadoop logs on the nodes also show that 
>>>> everything
>>>> is running and started up correctly.
>>>>
>>>> On the few node that have a local OS installed everything works 
>>>> just fine
>>>> and I can run the test jobs without issue (so far).
>>>>
>>>> I  am using the identical hadoop install and configuration between
>>>> netbooted nodes and none netbooted nodes.
>>>>
>>>> Has anyone encountered this type of issue ?
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>> -- 
>> Nick Rathke
>> Scientific Computing and Imaging Institute
>> Sr. Systems Administrator
>> nick@sci.utah.edu
>> www.sci.utah.edu
>> 801-587-9933
>> 801-557-3832
>>
>> "I came I saw I made it possible" Royal Bliss - Here They Come
>


Mime
View raw message