Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 68009 invoked from network); 11 Nov 2009 02:06:35 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 11 Nov 2009 02:06:35 -0000 Received: (qmail 72845 invoked by uid 500); 11 Nov 2009 02:06:32 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 72773 invoked by uid 500); 11 Nov 2009 02:06:32 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 72763 invoked by uid 99); 11 Nov 2009 02:06:32 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Nov 2009 02:06:32 +0000 X-ASF-Spam-Status: No, hits=-2.4 required=5.0 tests=AWL,BAYES_00,HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [209.85.222.195] (HELO mail-pz0-f195.google.com) (209.85.222.195) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Nov 2009 02:06:30 +0000 Received: by pzk33 with SMTP id 33so445916pzk.2 for ; Tue, 10 Nov 2009 18:06:09 -0800 (PST) MIME-Version: 1.0 Received: by 10.142.7.2 with SMTP id 2mr103766wfg.104.1257905169203; Tue, 10 Nov 2009 18:06:09 -0800 (PST) In-Reply-To: References: <45f85f70911091011o2e87f392s5da09beee8cc2fe4@mail.gmail.com> From: Todd Lipcon Date: Tue, 10 Nov 2009 18:05:49 -0800 Message-ID: <45f85f70911101805v32e45aeag7f65ebe2794eeb3d@mail.gmail.com> Subject: Re: NameNode/DataNode & JobTracker/TaskTracker To: common-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=00504502b08cc5823b04780edea1 --00504502b08cc5823b04780edea1 Content-Type: text/plain; charset=ISO-8859-1 On Mon, Nov 9, 2009 at 1:04 PM, John Martyniak wrote: > Thanks Todd. > > I wasn't sure if that is possible. But you pointed out an important point > and that is it is just NN and JT that would run remotely. > > So in order to do this would I just install the complete hadoop instance on > each one. And then would they be configed as masters? > > Or should NameNode and JobTracker run on the same machine? So there would > be one master. > > Either way. On all clusters but the largest, the NN and JT are not significant users of CPU. On medium size clusters they can start to use up multiple GBs of RAM. If you're using less than 30 nodes you can *probably* get by with one machine for both; I say probably because it depends on not just your total capacity but also the number of files you have. There are some rough sizing estimates if you google the archives for "CompressedOops" I think - someone did some measurements of the NN's memory requirements. > So when I start the cluster would I start it from the NN/JT machine. Could > it also be started from any of the other cluster members. > > It doesn't matter - Hadoop itself doesn't use SSH or anything. The daemons just all have to be started somehow. If you're using the Cloudera distribution with RPM/Deb you can use init scripts. If you prefer shell scripts and ssh you can use the provided start-all scripts, your own scripts, or something like pdssh or cap shell. If you're a masochist you can log into each node individually and start the daemons by hand. I do not recommend this last option :) > sorry for all of the seemingly basic questions, but want to get it right > the first time:) > Sure thing- we're here to help. -Todd > > > On Nov 9, 2009, at 1:11 PM, Todd Lipcon wrote: > > On Mon, Nov 9, 2009 at 7:20 AM, John Martyniak < >> john@beforedawnsolutions.com >> >>> wrote: >>> >> >> >>> Can the NameNode/DataNode & JobTracker/TaskTracker run on a server that >>> isn't part of the "cluster" meaning I would like to run it on a machine >>> that >>> wouldn't participate in the processing of data, and wouldn't participate >>> in >>> the HDFS data sharing, and would solely focus on the NameNode/DataNode & >>> JobTracker/TaskTracker tasks. >>> >>> >>> Yes, running the NN and the JT on servers that don't also run TT/DN is >> very >> common and recommended for clusters of more than maybe 5 nodes. >> >> -Todd >> > > --00504502b08cc5823b04780edea1--