Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 21CC0DAFB for ; Wed, 15 May 2013 07:51:06 +0000 (UTC) Received: (qmail 6335 invoked by uid 500); 15 May 2013 07:51:00 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 5781 invoked by uid 500); 15 May 2013 07:50:57 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 5749 invoked by uid 99); 15 May 2013 07:50:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 May 2013 07:50:56 +0000 X-ASF-Spam-Status: No, hits=4.5 required=5.0 tests=FORGED_YAHOO_RCVD,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [98.139.212.165] (HELO nm6.bullet.mail.bf1.yahoo.com) (98.139.212.165) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 15 May 2013 07:50:50 +0000 Received: from [98.139.212.148] by nm6.bullet.mail.bf1.yahoo.com with NNFMP; 15 May 2013 07:50:29 -0000 Received: from [98.139.211.202] by tm5.bullet.mail.bf1.yahoo.com with NNFMP; 15 May 2013 07:50:29 -0000 Received: from [127.0.0.1] by smtp211.mail.bf1.yahoo.com with NNFMP; 15 May 2013 07:50:29 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1368604229; bh=Iyy6wu2TxIFl7/OtpYmZFJ1htaM7raHDK0uP7OT8CGc=; h=X-Yahoo-Newman-Id:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:X-Rocket-Received:From:To:References:In-Reply-To:Subject:Date:Message-ID:MIME-Version:Content-Type:X-Mailer:Thread-Index:Content-Language; b=aSVVBHGdG+7BSgRqUkbjElkOeg+Md4KKFN+z5WnEky6KU5amu15alRIHzwVh4LM1PkR3ejBLiDd99wMyw3H5RRuJGCONBb5kZXlwwNk1D6P/+WzPhcVCvzLp01jffa3McQVYOZ8JKHE2InRKd6OKR7EdkZRWWFYwa98ytC56rwc= X-Yahoo-Newman-Id: 83100.29072.bm@smtp211.mail.bf1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: Dq3MV7AVM1ltD9_QLRw7CAQAbk_T3c9bK2GURIsNujIpI99 RKS9_GfIJRFX5TzlqcljjZa41prJWm7NrTmvZMLt1D_WDdtXq8B1Kkb53tg5 6Mf_wC39ka1rHlr5oR9AOuw9ubcRFB0ORvJMtoje0vGL6vM04S6p0QFw1h.X INr160zgKUqYs2U6zwiV4JxWlUu3N0Cl8OdNP0NXmy3EVtI4jJchg4AMqyIf G5FJdj4ZZbnXLon7kq9D61gt1VQd49zYGLiz21iNunjRc7.z7TlnCiwlKjmR kha3ZCr.cNutCPUR_rWeVR0Oc8jNUrxqwl46Go30UR.waF1nXHAwYzDg0dZq SXwaWWR3fDpSIwhFOZMvzei8pX4.ljr9FvJ8SVK7oxP1IifalkkA97aDtOBD 1G9aIZIbtF_knNmsMzBJ8OyhbYOX7DOTQ7nKKIq3hSqQHUizxPE3sqxiBoo7 RKLuAW_0PXtPs2F4lZa0ZUR3RdbQ- X-Yahoo-SMTP: k2gD1GeswBAV_JFpZm8dmpTCwr4ufTKOyA-- X-Rocket-Received: from sattelite (davidparks21@113.161.75.108 with ) by smtp211.mail.bf1.yahoo.com with SMTP; 15 May 2013 00:50:28 -0700 PDT From: "David Parks" To: References: In-Reply-To: Subject: RE: About configuring cluster setup Date: Wed, 15 May 2013 14:50:18 +0700 Message-ID: <0b4b01ce5140$da71c910$8f555b30$@yahoo.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0B4C_01CE517B.86D200A0" X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQHyXB/Oo6rj6DPT0MErR0FR6MCuLAJyjZCJmKovkgCAAAVmEA== Content-Language: en-us X-Virus-Checked: Checked by ClamAV on apache.org This is a multipart message in MIME format. ------=_NextPart_000_0B4C_01CE517B.86D200A0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit We have a box that's a bit overpowered for just running our namenode and jobtracker on a 10-node cluster and we also wanted to make use of the storage and processor resources of that node, like you. What we did is use LXC containers to segregate the different processes. LXC is a very light weight psudo-virtualization platform for linux (near 0 overhead). The key benefit to LXC, in this case, is that we can use linux cgroups (standard, simple config in LXC) to specify that the container/VM running the namenode/jobtracker should have 10x the CPU and IO resources than the container that runs a tasktracker/data node (though since LXC containers all run under the same kernel, any "unused" resources are assigned to runnable processes). We run cloudera hadoop and deployed a slightly modified tasktracker configuration on the shared box (fewer task slots so as to not over utilize memory). That tasktracker doesn't do as much work as the other dedicated nodes, but it does a fair share, and the cgroup configurations (cpu.shares & blkio.weight for the curious) ensure that the bulk processing doesn't interfere with the critical namenode & jobtracker systems. From: Robert Dyer [mailto:psybers@gmail.com] Sent: Tuesday, May 14, 2013 11:23 PM To: user@hadoop.apache.org Subject: Re: About configuring cluster setup You can, however note that unless you also run a TaskTracker on that node (bad idea) then any blocks that are replicated to this node won't be available as input to MapReduces and you are lowering the odds of having data locality on those blocks. On Tue, May 14, 2013 at 2:01 AM, Ramya S wrote: Hi, Can we configure 1 node as both Name node and Data node ? ------=_NextPart_000_0B4C_01CE517B.86D200A0 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

We have a box that’s a bit overpowered for just running our = namenode and jobtracker on a 10-node cluster and we also wanted to make = use of the storage and processor resources of that node, like = you.

 

What we did is use LXC containers to segregate the different = processes. LXC is a very light weight psudo-virtualization platform for = linux (near 0 overhead).

 

The key benefit to LXC, in this case, is that we can use linux = cgroups (standard, simple config in LXC) to specify that the = container/VM running the namenode/jobtracker should have 10x the CPU and = IO resources than the container that runs a tasktracker/data node = (though since LXC containers all run under the same kernel, any = “unused” resources are assigned to runnable = processes).

 

We run cloudera hadoop and deployed a slightly modified tasktracker = configuration on the shared box (fewer task slots so as to not over = utilize memory).

 

That tasktracker doesn’t do as much work as the other dedicated = nodes, but it does a fair share, and the cgroup configurations = (cpu.shares & blkio.weight for the curious) ensure that the bulk = processing doesn’t interfere with the critical namenode & = jobtracker systems.

 

 

From:= = Robert Dyer [mailto:psybers@gmail.com] =
Sent: Tuesday, May 14, 2013 11:23 PM
To: user@hadoop.apache.org
S= ubject: Re: About configuring cluster setup

 

You = can, however note that unless you also run a TaskTracker on that node = (bad idea) then any blocks that are replicated to this node won't be = available as input to MapReduces and you are lowering the odds of having = data locality on those blocks.

 

On Tue, = May 14, 2013 at 2:01 AM, Ramya S <ramyas@suntecgroup.com> = wrote:

Hi,

 

Can we = configure 1 node as both Name node and Data node = ?

------=_NextPart_000_0B4C_01CE517B.86D200A0--