hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Indranil Majumder (imajumde)" <imaju...@cisco.com>
Subject Hadoop setup doubts
Date Sat, 14 Dec 2013 10:33:34 GMT
I stared with Hadoop few days ago, I do have few doubts on the setup,

1.       For name node I do format the name directory, is it recommended to do the same for
the data node directories too.

2.       How does log aggregation work?

3.       Does resource manager run on every node (both Name and Data) or it can run as a separate

4.       What is the purpose of the webproxy? Is it really required?

5.       Is there any documentation on how to decide which scheduler type based on certain

6.       What is the recommended way of pushing  data into Hadoop cluster & submitting
 mapred jobs, i.e should we use another client  node, if so is there any client daemon to
run on it ?

7.       For the following nodes in clustered mode

A.      NameNode

B.      Secondary NameNode

C.      DataNode (2)

D.      Resource Manager

E.       WebProxy

F.       History Server( Map Reduce )
I want to write a PID monitor. Does anybody has the list of processes that would run on this
clusters when fully operational [may be output of ps -ef | grep "somekeyword" will do]

Thanks & Regards,

View raw message