Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of dontariq@gmail.com designates
 209.85.220.171 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAODuddLewreFnhXT0++BSz1JGerjna7xG6sD2PHgK-eqDuXzzg@mail.gmail.com>
References: 
 <CAODuddKW8NnQzxCum0T4qL9ZghqeNuLU_NoAhCUDPsTPBRKwSg@mail.gmail.com>
 <CAAXmExVOop83u3CCTjC1uQjpcZ0Rn3qpG_Kpj7WZEuVxNRZ8WA@mail.gmail.com>
 <3e15a80e-2980-4eaf-b37e-ff7554582b38@email.android.com>
 <CAODuddLewreFnhXT0++BSz1JGerjna7xG6sD2PHgK-eqDuXzzg@mail.gmail.com>
From: Mohammad Tariq <dontariq@gmail.com>
Date: Thu, 14 Feb 2013 23:57:00 +0530
Message-ID: 
 <CAMVC6RO+9=TKFgHyem4Tij444_LXBJ6_rpaerLhv9OasehkUdA@mail.gmail.com>
Subject: Re: Host NameNode, DataNode,
 JobTracker or TaskTracker on the same machine
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=f46d043be026fe925904d5b369f5

--f46d043be026fe925904d5b369f5
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

With the current configuration you are safe. But as your data grows you
will start consuming more space and eventually you might end with
insufficient space to hold the metadata itself as it is also getting stored
in the same disk. Also, bigger data means more no of files and blocks which
means more no of object which in turn means greater memory consumption. And
don't forget about the resource consumption of your processing layer. Like
disk space required to store the intermediate output files, resources
required to initiate map and reduce tasks etc.

But it all depends upon the size of your data and the intensity of
processing you are going to perform. As of now you look good to me with
128TB+64GB.

HTH

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Feb 14, 2013 at 11:35 PM, Jeff LI <uniquejeff@gmail.com> wrote:

> Thanks for your response.  I'm running SNN on another machine.
>
> Could you explain a bit more on why I may run out of memory or disk?
>
> I understand that NameNode holds file system metadata in memory.  I found
> through this post that (
> http://developer.yahoo.com/blogs/hadoop/posts/2010/05/scalability_of_the_=
hadoop_dist/
> )
> as a rule of thumb,
> 1 GB metadata =E2=89=88 1 PB physical storage
>
> Currently, my cluster has about 128TB of disk storage in total and 64GB
> memory on each machine.  Does this suggests that I'm protected against
> running out of memory from metadata?
>
> Thanks
>
> Cheers
>
> Jeff
>
>
> On Thu, Feb 14, 2013 at 12:41 PM, Tariq <dontariq@gmail.com> wrote:
>
>> You may run out of memory,out of disk. If SNN is also running on the sam=
e
>> machine then you are totally screwed in case of any breakdown
>>
>> shashwat shriparv <dwivedishashwat@gmail.com> wrote:
>>
>> >If you are doing it for production all the process should be running on
>> >seperate machine as it will decrease the overload of the machine.
>> >
>> >
>> >
>> >=E2=88=9E
>> >Shashwat Shriparv
>> >
>> >
>> >
>> >On Thu, Feb 14, 2013 at 10:40 PM, Jeff LI <uniquejeff@gmail.com> wrote:
>> >
>> >> Hello,
>> >>
>> >> Is there a good reason that we should not host NameNode, DataNode,
>> >> JobTracker or TaskTracker services on the same machine?
>> >>
>> >> Not doing so is suggested here
>> >http://wiki.apache.org/hadoop/NameNode,
>> >> but I'd like to know the reasoning of this.
>> >>
>> >> Thanks
>> >>
>> >> Cheers
>> >>
>> >> Jeff
>> >>
>> >>
>>
>> --
>> Sent from my Android phone with K-9 Mail. Please excuse my brevity.
>>
>
>

--f46d043be026fe925904d5b369f5
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">With the current configuration you are safe. But as your d=
ata grows you will start consuming more space and eventually you might end =
with insufficient space to hold the metadata itself as it is also getting s=
tored in the same disk. Also, bigger data means more no of files and blocks=
 which means more no of object which in turn means greater memory consumpti=
on. And don&#39;t forget about the resource consumption of your processing =
layer. Like disk space required to store the intermediate output files, res=
ources required to initiate map and reduce tasks etc.<div>

<br></div><div>But it all depends upon the size of your data and the intens=
ity of processing you are going to perform. As of now you look good to me w=
ith 128TB+64GB.</div><div><br></div><div style>HTH</div></div><div class=3D=
"gmail_extra">

<br clear=3D"all"><div><div dir=3D"ltr">Warm Regards,<div>Tariq</div><div><=
a href=3D"https://mtariq.jux.com/" target=3D"_blank">https://mtariq.jux.com=
/</a><br></div><div><a href=3D"http://cloudfront.blogspot.com" target=3D"_b=
lank">cloudfront.blogspot.com</a><br>

</div></div></div>
<br><br><div class=3D"gmail_quote">On Thu, Feb 14, 2013 at 11:35 PM, Jeff L=
I <span dir=3D"ltr">&lt;<a href=3D"mailto:uniquejeff@gmail.com" target=3D"_=
blank">uniquejeff@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"g=
mail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-l=
eft:1ex">

Thanks for your response. =C2=A0I&#39;m running SNN on another machine.<div=
><br></div><div>Could you explain a bit more on why I may run out of memory=
 or disk?</div><div><br></div><div>I understand that NameNode holds file sy=
stem metadata in memory. =C2=A0I found through this post that (<a href=3D"h=
ttp://developer.yahoo.com/blogs/hadoop/posts/2010/05/scalability_of_the_had=
oop_dist/" target=3D"_blank">http://developer.yahoo.com/blogs/hadoop/posts/=
2010/05/scalability_of_the_hadoop_dist/</a>)=C2=A0</div>


<div>as a rule of thumb,=C2=A0</div><div>1 GB metadata =E2=89=88 1 PB physi=
cal storage</div><div><br></div><div>Currently, my cluster has about 128TB =
of disk storage in total and 64GB memory on each machine. =C2=A0Does this s=
uggests that I&#39;m protected against running out of memory from metadata?=
=C2=A0</div>


<div><br></div><div>Thanks</div><div><br></div><div>Cheers</div><span class=
=3D"HOEnZb"><font color=3D"#888888"><div><br></div><div>Jeff</div></font></=
span><div class=3D"HOEnZb"><div class=3D"h5"><div><br></div><div><br><div c=
lass=3D"gmail_quote">

On Thu, Feb 14, 2013 at 12:41 PM, Tariq <span dir=3D"ltr">&lt;<a href=3D"ma=
ilto:dontariq@gmail.com" target=3D"_blank">dontariq@gmail.com</a>&gt;</span=
> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">You may run out of memory,out of disk. If SN=
N is also running on the same machine then you are totally screwed in case =
of any breakdown<br>


<div><div><br>
shashwat shriparv &lt;<a href=3D"mailto:dwivedishashwat@gmail.com" target=
=3D"_blank">dwivedishashwat@gmail.com</a>&gt; wrote:<br>
<br>
&gt;If you are doing it for production all the process should be running on=
<br>
&gt;seperate machine as it will decrease the overload of the machine.<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;=E2=88=9E<br>
&gt;Shashwat Shriparv<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;On Thu, Feb 14, 2013 at 10:40 PM, Jeff LI &lt;<a href=3D"mailto:uniquej=
eff@gmail.com" target=3D"_blank">uniquejeff@gmail.com</a>&gt; wrote:<br>
&gt;<br>
&gt;&gt; Hello,<br>
&gt;&gt;<br>
&gt;&gt; Is there a good reason that we should not host NameNode, DataNode,=
<br>
&gt;&gt; JobTracker or TaskTracker services on the same machine?<br>
&gt;&gt;<br>
&gt;&gt; Not doing so is suggested here<br>
&gt;<a href=3D"http://wiki.apache.org/hadoop/NameNode" target=3D"_blank">ht=
tp://wiki.apache.org/hadoop/NameNode</a>,<br>
&gt;&gt; but I&#39;d like to know the reasoning of this.<br>
&gt;&gt;<br>
&gt;&gt; Thanks<br>
&gt;&gt;<br>
&gt;&gt; Cheers<br>
&gt;&gt;<br>
&gt;&gt; Jeff<br>
&gt;&gt;<br>
&gt;&gt;<br>
<br>
</div></div><span><font color=3D"#888888">--<br>
Sent from my Android phone with K-9 Mail. Please excuse my brevity.<br>
</font></span></blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--f46d043be026fe925904d5b369f5--