giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Sirotin <>
Subject Re: Hadoop Multi Node Cluster Configuration
Date Sat, 09 Aug 2014 22:56:56 GMT
Hi Xenia,

which host is a master depends on which process are running on it. The 
SecondaryNameNode is in my view a master-process.

I have googled and found out, that I missused the master-file and told 
you wrong about this. Check out this link:

It means, in the master-file, you just put the hosts in, which will 
start the SecondaryNameNode. The NameNode itself will be started on the 
local machine, where will be executed. The JobTracker or 
ResourceManager must be started also locally on the machine with (for ResourceManager).

Take care to apply the ip-addresses in yarn-site.xml and distribute the 

The link above explains the SecondaryNameNode-stuff in more detail.

How you distribute the hadoop-processes on your machines depends on your 
hardware-resources and estimated usage. If you need for example one day 
more diskspace, then add the machine, where the NameNode or 
ResourceManager is running, to the slaves file. The master-processes 
itselfs dont need much diskspace for a 4-machine-cluster ;-). The 
following link explaines, that they just store the filesystem-metadata.

And btw your question matches more to the Hadoop-mailing-list - not the 
Ghiraph one.

Best regards,

On 09.08.2014 21:33, Xenia Demetriou wrote:
> Hi Alexander,
> Thanks for your help. Also I am not an expert.
>  In my cluster (4 machines) I define as following:
> In master file in all the machines I define two of the machines as 
> Master and SecondaryNameNode
> And in slave file in all the machines, I define the other two machines 
> as Datanode1 and DataNode2.
> I don't know if Master and SecondaryNameNode can also defined as 
> slaves  or  if it is better to define the SecondaryNameNode as slave 
> instead of master.
> Thanks

View raw message