hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mag Gam <magaw...@gmail.com>
Subject Re: rack awareness help
Date Fri, 19 Mar 2010 11:32:20 GMT
Thanks everyone. I think everyone can agree that this part of the
documentation is lacking for hadoop.

Can someone please provide be a use case, for example:

#server 1
Input > script.sh
Output > rack01

#server 2
Input > script.sh
Output > rack02


Is this how its supposed to work? I am bad with bash so I am trying to
understand the logic so I can implement it with another language such
as tcl


On Fri, Mar 19, 2010 at 1:00 AM, Christopher Tubbs <ctubbsii@gmail.com> wrote:
> You only specify the script on the namenode.
> So, you could do something like:
>
> #!/bin/bash
> #rack_decider.sh
>
> if [ $1 = "server1.mydomain" -o $1 = "192.168.0.1" ] ; then
>  echo rack1
> elif [ $1 = "server2.mydomain" -o $1 = "192.168.0.2" ] ; then
>  echo rack1
> elif [ $1 = "server3.mydomain" -o $1 = "192.168.0.3" ] ; then
>  echo rack2
> elif [ $1 = "server4.mydomain" -o $1 = "192.168.0.4" ] ; then
>  echo rack2
> else
>  echo unknown_rack
> fi
> # EOF
>
> Of course, this is by far the most basic script you could have (I'm
> not sure why it wasn't offered as an example instead of a more
> complicated one).
>
> On Thu, Mar 18, 2010 at 8:41 PM, Mag Gam <magawake@gmail.com> wrote:
>> Chris:
>>
>> This clears up my questions a lot! Thankyou.
>>
>> So, if I have 4 data servers and I want 2 racks. I can do this
>>
>> #!/bin/bash
>> #rack1.sh
>> echo rack1
>>
>> #bin/bash
>> #rack2.sh
>> echo rack2
>>
>>
>> So, I can do this for 2 servers
>>
>>
>> <property>
>>  <name>topology.script.file.name</name>
>>  <value>rack1.sh</value>
>> </property>
>>
>> And for the other 2 servers, I can do this:
>>
>>
>> <property>
>>  <name>topology.script.file.name</name>
>>  <value>rack2.sh</value>
>> </property>
>>
>>
>> correct?
>>
>>
>> On Thu, Mar 18, 2010 at 3:15 AM, Christopher Tubbs <ctubbsii@gmail.com> wrote:
>>> Hadoop will identify data nodes in your cluster by name and execute
>>> your script with the data node as an argument. The expected output of
>>> your script is the name of the rack on which it is located.
>>>
>>> The script you referenced takes the node name as an argument ($1), and
>>> crawls through a separate file looking up that node in the left
>>> column, and printing the value in the second column if it finds it.
>>>
>>> If you were to use this script, you would just create the topology
>>> file that lists all your nodes by name/ip on the left and the rack
>>> they are in on the right.
>>>
>>> On Wed, Mar 17, 2010 at 11:34 PM, Mag Gam <magawake@gmail.com> wrote:
>>>> Well,  I didn't really solve the problem. Now I have even more questions.
>>>>
>>>> I came across this script,
>>>> http://wiki.apache.org/hadoop/topology_rack_awareness_scripts
>>>>
>>>> but it makes no sense to me! Can someone please try to explain what
>>>> its trying to do?
>>>>
>>>>
>>>> MikeThomas:
>>>>
>>>> Your script isn't working for me. I think there are some syntax
>>>> errors. Is this how its supposed to look: http://pastebin.ca/1844287
>>>>
>>>> thanks
>>>>
>>>>
>>>>
>>>> On Thu, Mar 4, 2010 at 10:30 PM, Jeff Hammerbacher <hammer@cloudera.com>
wrote:
>>>>> Hey Mag,
>>>>>
>>>>> Glad you have solved the problem. I've created a JIRA ticket to improve
the
>>>>> existing documentation: https://issues.apache.org/jira/browse/HADOOP-6616.
>>>>> If you have some time, it would be useful to hear what could be added
to the
>>>>> existing documentation that would have helped you figure this out sooner.
>>>>>
>>>>> Thanks,
>>>>> Jeff
>>>>>
>>>>> On Thu, Mar 4, 2010 at 3:39 PM, Mag Gam <magawake@gmail.com> wrote:
>>>>>
>>>>>> Thanks everyone for explaining this to me instead of giving me RTFM!
>>>>>>
>>>>>> I will play around with it and see how far I get.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Mar 4, 2010 at 9:21 AM, Steve Loughran <stevel@apache.org>
wrote:
>>>>>> > Allen Wittenauer wrote:
>>>>>> >>
>>>>>> >> On 3/3/10 5:01 PM, "Mag Gam" <magawake@gmail.com>
wrote:
>>>>>> >>
>>>>>> >>> Thanks Alan! Your presentation is very nice!
>>>>>> >>
>>>>>> >> Thanks. :)
>>>>>> >>
>>>>>> >>> "If you don't provide a script for rack awareness, it
treats every
>>>>>> >>> node as if it was its own rack". I am using the default
settings and
>>>>>> >>> the report still says only 1 rack.
>>>>>> >>
>>>>>> >> Let's take a different approach to convince you. :)
>>>>>> >>
>>>>>> >> Think about the question:  Is there a difference between
all nodes in
>>>>>> one
>>>>>> >> rack vs. every node acting as a lone rack?
>>>>>> >>
>>>>>> >> The answer is no, there isn't any difference.  In both
cases, all copies
>>>>>> >> of
>>>>>> >> the blocks can go to pretty much any node. When a MR job
runs, every
>>>>>> node
>>>>>> >> would either be considered 'off rack' or 'rack-local'.
>>>>>> >>
>>>>>> >> So there is no difference.
>>>>>> >>
>>>>>> >>
>>>>>> >>> Do you mind sharing a script with us on how you determine
a rack? and
>>>>>> >>> a sample <configuration> </configuration>
syntax?
>>>>>> >>
>>>>>> >> Michael has already posted his, so I'll skip this one. :)
>>>>>> >>
>>>>>> >
>>>>>> > Think Mag probably wanted a shell script.
>>>>>> >
>>>>>> > Mag, give your machines IPv4 addresses that map to rack number.
10.1.1.*
>>>>>> for
>>>>>> > rack one, 10.1.2.* for rack 2, etc. Then just filter out the
IP address
>>>>>> by
>>>>>> > the top bytes, returning "10.1.1" for everything in rack one,
"10.1.2"
>>>>>> for
>>>>>> > rack 2; Hadoop will be happy
>>>>>> >
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message