hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mag Gam <magaw...@gmail.com>
Subject Re: rack awareness help
Date Sat, 20 Mar 2010 12:55:08 GMT
Thanks!

I managed to write a script which will give me rack01, rack02
depending on the ip address.

Now, how would I check how many racks there are in my cluster? That
not documented anywhere.

My intention is after I get all this working is to submit a bug report
for the documentation team so they can fix this.



On Fri, Mar 19, 2010 at 10:13 AM, Allen Wittenauer
<awittenauer@linkedin.com> wrote:
>
>
>
> On 3/19/10 4:32 AM, "Mag Gam" <magawake@gmail.com> wrote:
>
>> Thanks everyone. I think everyone can agree that this part of the
>> documentation is lacking for hadoop.
>>
>> Can someone please provide be a use case, for example:
>>
>> #server 1
>> Input > script.sh
>> Output > rack01
>>
>> #server 2
>> Input > script.sh
>> Output > rack02
>
> I think you have it in your head that the NameNode asks the DataNode what
> rack it is.  This is completely backwards.  The DataNode has *no* concept of
> what a rack is.  It is purely a storage process.  There isn't much logic in
> it at all.
>
> The topology script is *ONLY* run by the NameNode and JobTracker processes.
> That's it.  It is not run on the compute nodes.  That setting is completely
> *ignored* by the DataNode and TaskTracker processes.
>
> So to rewrite your use case:
>
> # NameNode
> Input > server 1
> Output > rack01
>
> # NameNode
> Input > server 2
> Output > rack02
>
>> Is this how its supposed to work? I am bad with bash so I am trying to
>> understand the logic so I can implement it with another language such
>> as tcl
>
>
> The program logic is :
>
> Input -> IP address or Hostname
> Output -> /racklocation
>
> That's it.  There is nothing fancy going on here.
>
>

Mime
View raw message