hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Barry, Sean F" <sean.f.ba...@intel.com>
Subject RE: multiple nodes one machine
Date Mon, 09 Apr 2012 20:22:20 GMT
Harsh, 
I am interested in adding datanodes just for testing.


I have a few more things I should have said earlier. 
My current cluster looks like this. Which I set up exactly like tutorial http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
except I am running Suse 12.1 on both boxes.

Master - running NameNode, DataNode, TaskTracker, SecNameNode and JobTtracker
------
Slave - running Datanode and TaskTracker

My (4 core) slave  machine is the one that I would like to add three additional datanodes
to But when I use the run-additionalDN.sh script I get an Usage: java DataNode    [-rollback].

Am I supposed to run the script on my master node or slave node?

-SB



-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Monday, April 09, 2012 11:35 AM
To: common-user@hadoop.apache.org
Subject: Re: multiple nodes one machine

Barry,

Depends on what you'll be testing. If you want more daemons, then yes you need to add more
nodes onto the same box (configs may be tweaked to achieve this). If you just want MR to provide
more slots for tasks, then a specific task tracker property alone may be edited.

For more daemons, see http://search-hadoop.com/m/a4klk28NUr12 and a neat config I use for
running them without too much config mess:
https://gist.github.com/2345300

For the latter, see:
http://wiki.apache.org/hadoop/FAQ#I_see_a_maximum_of_2_maps.2BAC8-reduces_spawned_concurrently_on_each_TaskTracker.2C_how_do_I_increase_that.3F

Alternatively, use the hadoop-test jar provided classes:
MiniDFSCluster and MiniMRCluster which can run from with a test suite itself (With multiple
threads as daemons, to simply test around with).

On Mon, Apr 9, 2012 at 9:52 PM, Barry, Sean F <sean.f.barry@intel.com> wrote:
> Hi all,
>
> I currently have a 2 node cluster up and running. But now I face a new issue, one of
my nodes is running a Datanode and a Tasktracker on a 4 core machine and in order to do a
bit of proof of concept testing I would like to have 4 nodes running on that particular machine.
Does this mean that I would need to set that up as a pseudodistributed cluster? or do you
have any other suggestions? And would I need to add 3 more datanodes and 3 more tasktrackers
or either or?
>
> Thanks
> -SB



--
Harsh J

Mime
View raw message