Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <C6EDA663-52C7-4506-9F19-C54AED05E031@beforedawnsolutions.com>
References: <DB85CF80-3E6F-4C4D-A7C7-0DD525D74FF5@beforedawnsolutions.com>
	<45f85f70911091011o2e87f392s5da09beee8cc2fe4@mail.gmail.com>
	<C6EDA663-52C7-4506-9F19-C54AED05E031@beforedawnsolutions.com>
From: Todd Lipcon <todd@cloudera.com>
Date: Tue, 10 Nov 2009 18:05:49 -0800
Message-ID: <45f85f70911101805v32e45aeag7f65ebe2794eeb3d@mail.gmail.com>
Subject: Re: NameNode/DataNode & JobTracker/TaskTracker
To: common-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=00504502b08cc5823b04780edea1

--00504502b08cc5823b04780edea1
Content-Type: text/plain; charset=ISO-8859-1

On Mon, Nov 9, 2009 at 1:04 PM, John Martyniak <john@beforedawnsolutions.com
> wrote:

> Thanks Todd.
>
> I wasn't sure if that is possible.  But you pointed out an important point
> and that is it is just NN and JT that would run remotely.
>
> So in order to do this would I just install the complete hadoop instance on
> each one.  And then would they be configed as masters?
>
> Or should NameNode and JobTracker run on the same machine?  So there would
> be one master.
>
>
Either way. On all clusters but the largest, the NN and JT are not
significant users of CPU. On medium size clusters they can start to use up
multiple GBs of RAM. If you're using less than 30 nodes you can *probably*
get by with one machine for both; I say probably because it depends on not
just your total capacity but also the number of files you have. There are
some rough sizing estimates if you google the archives for "CompressedOops"
I think - someone did some measurements of the NN's memory requirements.


> So when I start the cluster would I start it from the NN/JT machine.  Could
> it also be started from any of the other cluster members.
>
>
It doesn't matter - Hadoop itself doesn't use SSH or anything. The daemons
just all have to be started somehow. If you're using the Cloudera
distribution with RPM/Deb you can use init scripts. If you prefer shell
scripts and ssh you can use the provided start-all scripts, your own
scripts, or something like pdssh or cap shell. If you're a masochist you can
log into each node individually and start the daemons by hand. I do not
recommend this last option :)


> sorry for all of the seemingly basic questions, but want to get it right
> the first time:)
>

Sure thing- we're here to help.

-Todd


>
>
> On Nov 9, 2009, at 1:11 PM, Todd Lipcon wrote:
>
>  On Mon, Nov 9, 2009 at 7:20 AM, John Martyniak <
>> john@beforedawnsolutions.com
>>
>>> wrote:
>>>
>>
>>
>>> Can the NameNode/DataNode & JobTracker/TaskTracker run on a server that
>>> isn't part of the "cluster" meaning I would like to run it on a machine
>>> that
>>> wouldn't participate in the processing of data, and wouldn't participate
>>> in
>>> the HDFS data sharing, and would solely focus on the NameNode/DataNode &
>>> JobTracker/TaskTracker tasks.
>>>
>>>
>>>  Yes, running the NN and the JT on servers that don't also run TT/DN is
>> very
>> common and recommended for clusters of more than maybe 5 nodes.
>>
>> -Todd
>>
>
>

--00504502b08cc5823b04780edea1--