hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manjunath Hegde <hegma...@gmail.com>
Subject Re: Hadoop setup doubts
Date Sun, 15 Dec 2013 10:43:00 GMT
Please find my answers inline. Even i have started hadoop few days back, i
may be wrong.

1.       For name node I do format the name directory, is it recommended to
do the same for the data node directories too.
-----No, we do not format datanode

2.       How does log aggregation work?

3.       Does resource manager run on every node (both Name and Data) or it
can run as a separate node?

-------Only on node you have specified. It will usuall run on single node.

4.       What is the purpose of the webproxy? Is it really required?

5.       Is there any documentation on how to decide which scheduler type
based on certain parameters?

6.       What is the recommended way of pushing  data into Hadoop cluster &
submitting  mapred jobs, i.e should we use another client  node, if so is
there any client daemon to run on it ?

---- Do you have experiance with UNIX, if so hadoop commands are similer to
UNIX commands. Ex. below command works fine for me.

hdfs dfs -copyFromLocal <localfiledir> <hdfs file directory>


7.       For the following nodes in clustered mode

A.      NameNode

B.      Secondary NameNode

C.      DataNode (2)

D.      Resource Manager

E.       WebProxy

F.       History Server( Map Reduce )

I want to write a PID monitor. Does anybody has the list of processes that
would run on this clusters when fully operational [may be output of ps –ef
| grep “somekeyword” will do]


--- Just use jps if you only need to monitor process. It really depends on
your requirements.



Thanks & Regards,
Indranil


On Sat, Dec 14, 2013 at 4:03 PM, Indranil Majumder (imajumde) <
imajumde@cisco.com> wrote:

>  I stared with Hadoop few days ago, I do have few doubts on the setup,
>
>
>
> 1.       For name node I do format the name directory, is it recommended
> to do the same for the data node directories too.
>
> 2.       How does log aggregation work?
>
> 3.       Does resource manager run on every node (both Name and Data) or
> it can run as a separate node?
>
> 4.       What is the purpose of the webproxy? Is it really required?
>
> 5.       Is there any documentation on how to decide which scheduler type
> based on certain parameters?
>
> 6.       What is the recommended way of pushing  data into Hadoop cluster
> & submitting  mapred jobs, i.e should we use another client  node, if so is
> there any client daemon to run on it ?
>
> 7.       For the following nodes in clustered mode
>
> A.      NameNode
>
> B.      Secondary NameNode
>
> C.      DataNode (2)
>
> D.      Resource Manager
>
> E.       WebProxy
>
> F.       History Server( Map Reduce )
>
> I want to write a PID monitor. Does anybody has the list of processes that
> would run on this clusters when fully operational [may be output of ps –ef
> | grep “somekeyword” will do]
>
>
>
> Thanks & Regards,
>
> Indranil
>

Mime
View raw message