Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of nitinpawar432@gmail.com
 designates 209.85.215.45 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <1365755432.10219.YahooMailNeo@web190701.mail.sg3.yahoo.com>
References: <1364377874.13753.YahooMailNeo@web194703.mail.sg3.yahoo.com>
	<CADfVb56UdbZCsDd=K7+ruYcJ_veVYEQVAdKouYJChT32Uvv5+g@mail.gmail.com>
	<1364577771.12724.YahooMailNeo@web194704.mail.sg3.yahoo.com>
	<1364719534.91394.YahooMailNeo@web194703.mail.sg3.yahoo.com>
	<1365042870.89547.YahooMailNeo@web194702.mail.sg3.yahoo.com>
	<1365740112.75877.YahooMailNeo@web190702.mail.sg3.yahoo.com>
	<CAH4HbuEQPR=ec6P_km1bO19jB4sTPF_YQdFKa=ueZPNmWDsU7g@mail.gmail.com>
	<578478094-1365742754-cardhu_decombobulator_blackberry.rim.net-2001349931-@b16.c6.bise7.blackberry>
	<1365754060.82642.YahooMailNeo@web190701.mail.sg3.yahoo.com>
	<1365754228.40174.YahooMailNeo@web190705.mail.sg3.yahoo.com>
	<1365754910.18961.YahooMailNeo@web190702.mail.sg3.yahoo.com>
	<1365755432.10219.YahooMailNeo@web190701.mail.sg3.yahoo.com>
Date: Fri, 12 Apr 2013 14:27:09 +0530
Message-ID: 
 <CAORpBsiysRYuKtCtZZ0MGHDmPB=cgvdVz05-AM113fsH3MHy=A@mail.gmail.com>
Subject: Re: Will HDFS refer to the memory of NameNode & DataNode or is it a
 separate machine
From: Nitin Pawar <nitinpawar432@gmail.com>
To: user@hadoop.apache.org, Sai Sai <saigraph@yahoo.in>
Content-Type: multipart/alternative; boundary=047d7b3a824c96b2dd04da2616a1

--047d7b3a824c96b2dd04da2616a1
Content-Type: text/plain; charset=ISO-8859-1

HDFS - hadoop distributed file system
as it stands a file system .. first basic question you will need to search
is do you need a process to run a file system?
when you find an answer to that second question will be
will a single process be enough for a distributed system ? meaning sub
components of the system may exist on different machines

namenode and datanode combined make hdfs. combining all  of their processes
you make hdfs.

namenode is master for the hdfs which keeps the file system image in memory
when it starts it loads it up in memory and serves all requests from memory
there on. There are steps taken to save the FSImage to disk. You can read
about it in detail in hdfs architecture.

when you put a file in hdfs .. it may or may not go to a single machine.
Namenode never stores the data files. it just stores the metadata for the
hdfs.
so when you load a file it will be going to datanode and the file
information will be going to namenode. depending on the size it will be
split in multiple blocks and then multiple blocks may land on multiple
datanodes. If your filesize is less than or exactly equal to block size you
can find out which datanode it is located. else there is no guarantee that
file will be only on single node only if you have fully distributed mode

PS: this is my understanding. Others may correct me as well


On Fri, Apr 12, 2013 at 2:00 PM, Sai Sai <saigraph@yahoo.in> wrote:

> A few basic questions:
>
> Will HDFS refer to the memory of NameNode & DataNode or is it a separate
> machine.
>
> For NameNode, DataNode and others there is a process associated with each
> of em.
> But no process is for HDFS, wondering why? I understand that fsImage has
> the meta data of the HDFS, so when NameNode or DataNode or JobTracker/TT
> needs to get file info will they just look into the fsImage.
>
> When we put a file in HDFS is it possible to look/find in which node
> (NN/DN) it physically sits.
>
> Any help is appreciated.
> Thanks
> Sai
>


-- 
Nitin Pawar

--047d7b3a824c96b2dd04da2616a1
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">HDFS - hadoop distributed file system=A0<div>as it stands =
a file system .. first basic question you will need to search is do you nee=
d a process to run a file system?=A0</div><div>when you find an answer to t=
hat second question will be=A0</div>
<div>will a single process be enough for a distributed system ? meaning sub=
 components of the system may exist on different machines</div><div><br></d=
iv><div style>namenode and datanode combined make hdfs. combining all =A0of=
 their processes you make hdfs.=A0</div>
<div style><br></div><div style>namenode is master for the hdfs which keeps=
 the file system image in memory when it starts it loads it up in memory an=
d serves all requests from memory there on. There are steps taken to save t=
he FSImage to disk. You can read about it in detail in hdfs architecture.=
=A0</div>
<div style><br></div><div style>when you put a file in hdfs .. it may or ma=
y not go to a single machine. Namenode never stores the data files. it just=
 stores the metadata for the hdfs.=A0</div><div style>so when you load a fi=
le it will be going to datanode and the file information will be going to n=
amenode. depending on the size it will be split in multiple blocks and then=
 multiple blocks may land on multiple datanodes. If your filesize is less t=
han or exactly equal to block size you can find out which datanode it is lo=
cated. else there is no=A0guarantee that file will be only on single node o=
nly if you have fully distributed mode=A0</div>
<div style><br></div><div style>PS: this is my understanding. Others may co=
rrect me as well =A0</div></div><div class=3D"gmail_extra"><br><br><div cla=
ss=3D"gmail_quote">On Fri, Apr 12, 2013 at 2:00 PM, Sai Sai <span dir=3D"lt=
r">&lt;<a href=3D"mailto:saigraph@yahoo.in" target=3D"_blank">saigraph@yaho=
o.in</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div><div style=3D"font-size:12pt;font-famil=
y:times new roman,new york,times,serif"><div>A few basic questions:</div><d=
iv>
<br></div><div style=3D"font-style:normal;font-size:16px;background-color:t=
ransparent;font-family:&#39;times new roman&#39;,&#39;new york&#39;,times,s=
erif">Will HDFS refer to the memory of NameNode &amp; DataNode or is it a s=
eparate machine.<br>
</div><div style=3D"font-style:normal;font-size:16px;background-color:trans=
parent;font-family:&#39;times new roman&#39;,&#39;new york&#39;,times,serif=
"><br></div><div style=3D"font-style:normal;font-size:16px;background-color=
:transparent;font-family:&#39;times new roman&#39;,&#39;new york&#39;,times=
,serif">
For NameNode, DataNode and others there is a process associated with each o=
f em.</div><div style=3D"font-style:normal;font-size:16px;background-color:=
transparent;font-family:&#39;times new roman&#39;,&#39;new york&#39;,times,=
serif">
But no process is for HDFS, wondering why? I understand that fsImage has th=
e meta data of the HDFS, so when NameNode or DataNode or JobTracker/TT need=
s to get file info will they just look into the fsImage.</div><div style=3D=
"font-style:normal;font-size:16px;background-color:transparent;font-family:=
&#39;times new roman&#39;,&#39;new york&#39;,times,serif">
<br></div><div style=3D"font-style:normal;font-size:16px;background-color:t=
ransparent;font-family:&#39;times new roman&#39;,&#39;new york&#39;,times,s=
erif">When we put a file in HDFS is it possible to look/find in which node =
(NN/DN) it physically sits.</div>
<div style=3D"font-style:normal;font-size:16px;background-color:transparent=
;font-family:&#39;times new roman&#39;,&#39;new york&#39;,times,serif"><br>=
</div><div style=3D"font-style:normal;font-size:16px;background-color:trans=
parent;font-family:&#39;times new roman&#39;,&#39;new york&#39;,times,serif=
">
Any help is appreciated.</div><div style=3D"font-style:normal;font-size:16p=
x;background-color:transparent;font-family:&#39;times new roman&#39;,&#39;n=
ew york&#39;,times,serif">Thanks</div><span class=3D"HOEnZb"><font color=3D=
"#888888"><div style=3D"font-style:normal;font-size:16px;background-color:t=
ransparent;font-family:&#39;times new roman&#39;,&#39;new york&#39;,times,s=
erif">
Sai</div>  </font></span></div></div></blockquote></div><br><br clear=3D"al=
l"><div><br></div>-- <br>Nitin Pawar<br>
</div>

--047d7b3a824c96b2dd04da2616a1--