On 3/19/10 4:32 AM, "Mag Gam" wrote: > Thanks everyone. I think everyone can agree that this part of the > documentation is lacking for hadoop. > > Can someone please provide be a use case, for example: > > #server 1 > Input > script.sh > Output > rack01 > > #server 2 > Input > script.sh > Output > rack02 I think you have it in your head that the NameNode asks the DataNode what rack it is. This is completely backwards. The DataNode has *no* concept of what a rack is. It is purely a storage process. There isn't much logic in it at all. The topology script is *ONLY* run by the NameNode and JobTracker processes. That's it. It is not run on the compute nodes. That setting is completely *ignored* by the DataNode and TaskTracker processes. So to rewrite your use case: # NameNode Input > server 1 Output > rack01 # NameNode Input > server 2 Output > rack02 > Is this how its supposed to work? I am bad with bash so I am trying to > understand the logic so I can implement it with another language such > as tcl The program logic is : Input -> IP address or Hostname Output -> /racklocation That's it. There is nothing fancy going on here.