Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: local policy)
From: "David Parks" <davidparks21@yahoo.com>
To: <user@hadoop.apache.org>
References: <DF3D307645E7C8479A02AD016BAAAD0E0B24B3DF@sbssvex20.suntecsbs.com>
 <CAMiz3FMzvnigk5fuKPhB5TEzgyZiZPT8WGzVONG703X+KjCb9A@mail.gmail.com> 
In-Reply-To: 
Subject: RE: About configuring cluster setup
Date: Wed, 15 May 2013 14:50:18 +0700
Message-ID: <0b4b01ce5140$da71c910$8f555b30$@yahoo.com>
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_NextPart_000_0B4C_01CE517B.86D200A0"
Thread-Index: AQHyXB/Oo6rj6DPT0MErR0FR6MCuLAJyjZCJmKovkgCAAAVmEA==
Content-Language: en-us

This is a multipart message in MIME format.

------=_NextPart_000_0B4C_01CE517B.86D200A0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit

We have a box that's a bit overpowered for just running our namenode and
jobtracker on a 10-node cluster and we also wanted to make use of the
storage and processor resources of that node, like you.

 
What we did is use LXC containers to segregate the different processes. LXC
is a very light weight psudo-virtualization platform for linux (near 0
overhead).

 
The key benefit to LXC, in this case, is that we can use linux cgroups
(standard, simple config in LXC) to specify that the container/VM running
the namenode/jobtracker should have 10x the CPU and IO resources than the
container that runs a tasktracker/data node (though since LXC containers all
run under the same kernel, any "unused" resources are assigned to runnable
processes).

 
We run cloudera hadoop and deployed a slightly modified tasktracker
configuration on the shared box (fewer task slots so as to not over utilize
memory). 

 
That tasktracker doesn't do as much work as the other dedicated nodes, but
it does a fair share, and the cgroup configurations (cpu.shares &
blkio.weight for the curious) ensure that the bulk processing doesn't
interfere with the critical namenode & jobtracker systems.

 
From: Robert Dyer [mailto:psybers@gmail.com] 
Sent: Tuesday, May 14, 2013 11:23 PM
To: user@hadoop.apache.org
Subject: Re: About configuring cluster setup

 
You can, however note that unless you also run a TaskTracker on that node
(bad idea) then any blocks that are replicated to this node won't be
available as input to MapReduces and you are lowering the odds of having
data locality on those blocks.

 
On Tue, May 14, 2013 at 2:01 AM, Ramya S <ramyas@suntecgroup.com> wrote:

Hi,

 
Can we configure 1 node as both Name node and Data node ?


------=_NextPart_000_0B4C_01CE517B.86D200A0
Content-Type: text/html;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<html xmlns:v=3D"urn:schemas-microsoft-com:vml" =
xmlns:o=3D"urn:schemas-microsoft-com:office:office" =
xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" =
xmlns=3D"http://www.w3.org/TR/REC-html40"><head><META =
HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Dus-ascii"><meta name=3DGenerator content=3D"Microsoft Word 14 =
(filtered medium)"><style><!--
/* Font Definitions */
@font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
	{font-family:Tahoma;
	panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:blue;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{mso-style-priority:99;
	color:purple;
	text-decoration:underline;}
p.MsoAcetate, li.MsoAcetate, div.MsoAcetate
	{mso-style-priority:99;
	mso-style-link:"Balloon Text Char";
	margin:0in;
	margin-bottom:.0001pt;
	font-size:8.0pt;
	font-family:"Tahoma","sans-serif";}
span.EmailStyle17
	{mso-style-type:personal;
	font-family:"Calibri","sans-serif";
	color:#1F497D;}
span.EmailStyle18
	{mso-style-type:personal-reply;
	font-family:"Calibri","sans-serif";
	color:#1F497D;}
span.BalloonTextChar
	{mso-style-name:"Balloon Text Char";
	mso-style-priority:99;
	mso-style-link:"Balloon Text";
	font-family:"Tahoma","sans-serif";}
.MsoChpDefault
	{mso-style-type:export-only;
	font-size:10.0pt;}
@page WordSection1
	{size:8.5in 11.0in;
	margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
	{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext=3D"edit">
<o:idmap v:ext=3D"edit" data=3D"1" />
</o:shapelayout></xml><![endif]--></head><body lang=3DEN-US link=3Dblue =
vlink=3Dpurple><div class=3DWordSection1><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497=
D'>We have a box that&#8217;s a bit overpowered for just running our =
namenode and jobtracker on a 10-node cluster and we also wanted to make =
use of the storage and processor resources of that node, like =
you.<o:p></o:p></span></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497=
D'><o:p>&nbsp;</o:p></span></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497=
D'>What we did is use LXC containers to segregate the different =
processes. LXC is a very light weight psudo-virtualization platform for =
linux (near 0 overhead).<o:p></o:p></span></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497=
D'><o:p>&nbsp;</o:p></span></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497=
D'>The key benefit to LXC, in this case, is that we can use linux =
cgroups (standard, simple config in LXC) to specify that the =
container/VM running the namenode/jobtracker should have 10x the CPU and =
IO resources than the container that runs a tasktracker/data node =
(though since LXC containers all run under the same kernel, any =
&#8220;unused&#8221; resources are assigned to runnable =
processes).<o:p></o:p></span></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497=
D'><o:p>&nbsp;</o:p></span></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497=
D'>We run cloudera hadoop and deployed a slightly modified tasktracker =
configuration on the shared box (fewer task slots so as to not over =
utilize memory). <o:p></o:p></span></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497=
D'><o:p>&nbsp;</o:p></span></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497=
D'>That tasktracker doesn&#8217;t do as much work as the other dedicated =
nodes, but it does a fair share, and the cgroup configurations =
(cpu.shares &amp; blkio.weight for the curious) ensure that the bulk =
processing doesn&#8217;t interfere with the critical namenode &amp; =
jobtracker systems.<o:p></o:p></span></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497=
D'><o:p>&nbsp;</o:p></span></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497=
D'><o:p>&nbsp;</o:p></span></p><p class=3DMsoNormal><b><span =
style=3D'font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span>=
</b><span style=3D'font-size:10.0pt;font-family:"Tahoma","sans-serif"'> =
Robert Dyer [<a =
href=3D"mailto:psybers@gmail.com">mailto:psybers@gmail.com</a>] =
<br><b>Sent:</b> Tuesday, May 14, 2013 11:23 PM<br><b>To:</b> <a =
href=3D"mailto:user@hadoop.apache.org">user@hadoop.apache.org</a><br><b>S=
ubject:</b> Re: About configuring cluster setup<o:p></o:p></span></p><p =
class=3DMsoNormal><o:p>&nbsp;</o:p></p><div><p class=3DMsoNormal>You =
can, however note that unless you also run a TaskTracker on that node =
(bad idea) then any blocks that are replicated to this node won't be =
available as input to MapReduces and you are lowering the odds of having =
data locality on those blocks.<o:p></o:p></p><div><p =
class=3DMsoNormal><o:p>&nbsp;</o:p></p><div><p class=3DMsoNormal>On Tue, =
May 14, 2013 at 2:01 AM, Ramya S &lt;<a =
href=3D"mailto:ramyas@suntecgroup.com" =
target=3D"_blank">ramyas@suntecgroup.com</a>&gt; =
wrote:<o:p></o:p></p><div><div><p class=3DMsoNormal><span =
style=3D'font-family:"Arial","sans-serif";color:black'>Hi,</span><o:p></o=
:p></p></div><div><p =
class=3DMsoNormal>&nbsp;<o:p></o:p></p></div><div><p =
class=3DMsoNormal><span =
style=3D'font-family:"Arial","sans-serif";color:black'>Can&nbsp;we =
configure 1 node as both Name node and Data node =
?</span><o:p></o:p></p></div></div></div></div></div></div></body></html>
------=_NextPart_000_0B4C_01CE517B.86D200A0--