Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8C33D104AE for ; Sat, 14 Dec 2013 10:34:18 +0000 (UTC) Received: (qmail 35871 invoked by uid 500); 14 Dec 2013 10:34:09 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 35739 invoked by uid 500); 14 Dec 2013 10:34:07 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 35732 invoked by uid 99); 14 Dec 2013 10:34:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 14 Dec 2013 10:34:06 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of imajumde@cisco.com designates 173.37.142.93 as permitted sender) Received: from [173.37.142.93] (HELO alln-iport-6.cisco.com) (173.37.142.93) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 14 Dec 2013 10:33:58 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=12977; q=dns/txt; s=iport; t=1387017237; x=1388226837; h=from:to:subject:date:message-id:mime-version; bh=Psj8Gb0jp4fD1Ut2mhtD6Rwkooh1/RpiIhkoLbGN/4w=; b=DxdvL9ZdcSQToTM4YTk9QXsrqrZnHnbmK6YUqEb2RMPqDiX3TdARw5cG ZJRpF7Z67AVbyz/j7mYeCGuVGXBRKJ4MmtkyntCVaHAW3NLyxVj+vmrO8 srkdstThkAV7gG3ysLs1CkPY2Py8lFKBGwBiadWN2ilfgk1D8+Y+4sHT/ A=; X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AkcFABwzrFKtJXG+/2dsb2JhbABZgkZEOFW4YYEeFnSCJwEELV4BKlYmAQQbh3yjP6ZrF45og1uBEwSqKoMqgio X-IronPort-AV: E=Sophos;i="4.95,485,1384300800"; d="scan'208,217";a="6765966" Received: from rcdn-core2-3.cisco.com ([173.37.113.190]) by alln-iport-6.cisco.com with ESMTP; 14 Dec 2013 10:33:35 +0000 Received: from xhc-rcd-x05.cisco.com (xhc-rcd-x05.cisco.com [173.37.183.79]) by rcdn-core2-3.cisco.com (8.14.5/8.14.5) with ESMTP id rBEAXZ4O005632 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL) for ; Sat, 14 Dec 2013 10:33:35 GMT Received: from xmb-rcd-x04.cisco.com ([169.254.8.232]) by xhc-rcd-x05.cisco.com ([173.37.183.79]) with mapi id 14.03.0123.003; Sat, 14 Dec 2013 04:33:35 -0600 From: "Indranil Majumder (imajumde)" To: "user@hadoop.apache.org" Subject: Hadoop setup doubts Thread-Topic: Hadoop setup doubts Thread-Index: Ac74t+zYf4hzg0DzT+qqzSwcedu6Fg== Date: Sat, 14 Dec 2013 10:33:34 +0000 Message-ID: <1432A59311824941813F93226167B37A09842DC4@xmb-rcd-x04.cisco.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.143.32.241] Content-Type: multipart/alternative; boundary="_000_1432A59311824941813F93226167B37A09842DC4xmbrcdx04ciscoc_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_1432A59311824941813F93226167B37A09842DC4xmbrcdx04ciscoc_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable I stared with Hadoop few days ago, I do have few doubts on the setup, 1. For name node I do format the name directory, is it recommended to= do the same for the data node directories too. 2. How does log aggregation work? 3. Does resource manager run on every node (both Name and Data) or it= can run as a separate node? 4. What is the purpose of the webproxy? Is it really required? 5. Is there any documentation on how to decide which scheduler type b= ased on certain parameters? 6. What is the recommended way of pushing data into Hadoop cluster &= submitting mapred jobs, i.e should we use another client node, if so is = there any client daemon to run on it ? 7. For the following nodes in clustered mode A. NameNode B. Secondary NameNode C. DataNode (2) D. Resource Manager E. WebProxy F. History Server( Map Reduce ) I want to write a PID monitor. Does anybody has the list of processes that = would run on this clusters when fully operational [may be output of ps -ef = | grep "somekeyword" will do] Thanks & Regards, Indranil --_000_1432A59311824941813F93226167B37A09842DC4xmbrcdx04ciscoc_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

I stared with Hadoop few days ago, I do have few dou= bts on the setup,

 

1.     &= nbsp; For name node I do format the name directory, is it= recommended to do the same for the data node directories too.

2.     &= nbsp; How does log aggregation work?

3.     &= nbsp; Does resource manager run on every node (both Name = and Data) or it can run as a separate node?

4.     &= nbsp; What is the purpose of the webproxy? Is it really r= equired?

5.     &= nbsp; Is there any documentation on how to decide which s= cheduler type based on certain parameters?

6.     &= nbsp; What is the recommended way of pushing  data i= nto Hadoop cluster & submitting  mapred jobs, i.e should we use an= other client  node, if so is there any client daemon to run on it ?

7.     &= nbsp; For the following nodes in clustered mode

A.      NameNode

B.      Secondary NameNode

C.      DataNode (2)

D.      Resource Manager

E.       WebProxy

F.       History Server( Map Reduce )

I want to write a PID mon= itor. Does anybody has the list of processes that would run on this cluster= s when fully operational [may be output of ps –ef | grep “somek= eyword” will do]

 

Thanks & Regards,

Indranil

--_000_1432A59311824941813F93226167B37A09842DC4xmbrcdx04ciscoc_--