hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bejoy KS" <bejoy.had...@gmail.com>
Subject Re: Map/Reduce | Multiple node configuration
Date Tue, 12 Jun 2012 07:08:12 GMT
Hi Girish

Lemme try answering your queries

1. For multiple nodes I understand I should add the URL of the secondary nodes in the slaves.xml.
Am I correct?

Bejoy: AFAIK you nedd to add it on /etc/hosts

2. What should be installed on the secondary nodes for executing a job/task?

Bejoy: In small clusters you have the NameNode and JobTracker on one node , SecondaryNameNode
on another node and DataNode and TaskTrackers on all other nodes.

3. I understand I can set the map/reduce classes as a jar to the Job - through the JobConf
- so does this mean I need not really install/copy my map/reduce code on all the secondary
nodes?

Bejoy: There is no differnce in sub$itting jobs as compared to a pseudo node set up. MapReduce
frame work distributes this job jar and other required files. It is better having a client
node to launch jobs

4. How do I route the data to these nodes? Is it required for the Map Reduce to execute on
the machines which has the data stored (DFS)?

Bejoy: MR framework takes care of this. Map tasks consider data locality.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Girish Ravi <girishr@srmtech.com>
Date: Tue, 12 Jun 2012 06:55:26 
To: mapreduce-user@hadoop.apache.org<mapreduce-user@hadoop.apache.org>
Reply-To: mapreduce-user@hadoop.apache.org
Subject: Map/Reduce | Multiple node configuration

Hello Team,

I have started to understand about Hadoop Mapreduce and was able to set-up a single cluster
single node execution environment.

I want to now extend this to a multi node environment.
I have the following questions and it would very helpful if somebody can help:
1. For multiple nodes I understand I should add the URL of the secondary nodes in the slaves.xml.
Am I correct?
2. What should be installed on the secondary nodes for executing a job/task?
3. I understand I can set the map/reduce classes as a jar to the Job - through the JobConf
- so does this mean I need not really install/copy my map/reduce code on all the secondary
nodes?
4. How do I route the data to these nodes? Is it required for the Map Reduce to execute on
the machines which has the data stored (DFS)?

Any samples for doing this would help.
Request for suggestions.

Regards
Girish
Ph: +91-9916212114

Mime
View raw message