hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allen Wittenauer <awittena...@linkedin.com>
Subject Re: rack awareness help
Date Fri, 19 Mar 2010 15:13:21 GMT

On 3/19/10 4:32 AM, "Mag Gam" <magawake@gmail.com> wrote:

> Thanks everyone. I think everyone can agree that this part of the
> documentation is lacking for hadoop.
> Can someone please provide be a use case, for example:
> #server 1
> Input > script.sh
> Output > rack01
> #server 2
> Input > script.sh
> Output > rack02

I think you have it in your head that the NameNode asks the DataNode what
rack it is.  This is completely backwards.  The DataNode has *no* concept of
what a rack is.  It is purely a storage process.  There isn't much logic in
it at all.

The topology script is *ONLY* run by the NameNode and JobTracker processes.
That's it.  It is not run on the compute nodes.  That setting is completely
*ignored* by the DataNode and TaskTracker processes.

So to rewrite your use case:

# NameNode 
Input > server 1
Output > rack01

# NameNode
Input > server 2
Output > rack02

> Is this how its supposed to work? I am bad with bash so I am trying to
> understand the logic so I can implement it with another language such
> as tcl

The program logic is :

Input -> IP address or Hostname
Output -> /racklocation

That's it.  There is nothing fancy going on here.  

View raw message