hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Thomas <tho...@hep.caltech.edu>
Subject Re: rack awareness help
Date Sat, 20 Mar 2010 16:15:39 GMT
On 03/20/2010 05:55 AM, Mag Gam wrote:
> Thanks!
>
> I managed to write a script which will give me rack01, rack02
> depending on the ip address.
>
> Now, how would I check how many racks there are in my cluster? That
> not documented anywhere.

'hadoop fsck /' will report the # of racks at the end:

[...]
  Number of data-nodes:          163
  Number of racks:               12

You can verify the IP-to-rack mappings with 'hadoop dfsadmin -report':
[...]
Name: 10.3.255.144:50010
Rack: /Rack12
[...]
Name: 10.3.255.62:50010
Rack: /Rack16

--Mike

> My intention is after I get all this working is to submit a bug report
> for the documentation team so they can fix this.
>
>
>
> On Fri, Mar 19, 2010 at 10:13 AM, Allen Wittenauer
> <awittenauer@linkedin.com>  wrote:
>>
>>
>>
>> On 3/19/10 4:32 AM, "Mag Gam"<magawake@gmail.com>  wrote:
>>
>>> Thanks everyone. I think everyone can agree that this part of the
>>> documentation is lacking for hadoop.
>>>
>>> Can someone please provide be a use case, for example:
>>>
>>> #server 1
>>> Input>  script.sh
>>> Output>  rack01
>>>
>>> #server 2
>>> Input>  script.sh
>>> Output>  rack02
>>
>> I think you have it in your head that the NameNode asks the DataNode what
>> rack it is.  This is completely backwards.  The DataNode has *no* concept of
>> what a rack is.  It is purely a storage process.  There isn't much logic in
>> it at all.
>>
>> The topology script is *ONLY* run by the NameNode and JobTracker processes.
>> That's it.  It is not run on the compute nodes.  That setting is completely
>> *ignored* by the DataNode and TaskTracker processes.
>>
>> So to rewrite your use case:
>>
>> # NameNode
>> Input>  server 1
>> Output>  rack01
>>
>> # NameNode
>> Input>  server 2
>> Output>  rack02
>>
>>> Is this how its supposed to work? I am bad with bash so I am trying to
>>> understand the logic so I can implement it with another language such
>>> as tcl
>>
>>
>> The program logic is :
>>
>> Input ->  IP address or Hostname
>> Output ->  /racklocation
>>
>> That's it.  There is nothing fancy going on here.
>>
>>



Mime
View raw message