Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of sindhuht@gmail.com designates
 74.125.82.182 as permitted sender)
From: Sindhu Hosamane <sindhuht@gmail.com>
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_F8E2BDEB-16FA-44FC-99DB-FD55236F224F"
Message-Id: <BF41F315-43B8-44E4-80E7-E598F70DD485@gmail.com>
Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.2\))
Subject: Re: How to make sure data blocks are shared between 2 datanodes
Date: Mon, 26 May 2014 20:08:44 +0200
References: <B1D1BF0D-705A-428E-977B-DFFBA4C1230D@gmail.com>
 <CA708D5A-FB4C-481C-9F22-411536FBE65B@gmail.com>
 <CAO6JcphXAeDXe9C3txA4my9V5Bzk1tZ2R3aBtC4-r3L3BDf76w@mail.gmail.com>
To: user@hadoop.apache.org
In-Reply-To: 
 <CAO6JcphXAeDXe9C3txA4my9V5Bzk1tZ2R3aBtC4-r3L3BDf76w@mail.gmail.com>


--Apple-Mail=_F8E2BDEB-16FA-44FC-99DB-FD55236F224F
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=windows-1252


ok .thanks for that information .=20
As i said i am running  2 datanodes on same machine . so my haddop home =
has 2 conf folders .
conf and conf2  and in turn 2 hdfs-site.xml in both conf folders .
I guess dfs.replication value in hdfs-site.xml of conf folder should be =
3 .
What should i have it in conf2  ? should it be 1 there ?

sorry if question sounds stupid . But i unfamiliar with these kind of =
settings ( 2 datanodes on same machine ..so having 2 conf )


 If data is split across multiple datanodes , then processing capacity =
would be improved - ( thats what i guess ) since my file is only 240 KB =
, it occupies only one block . It cannot use second block and remain in =
another datanode .=20
So now , does it make sense to reduce the block size so that blocks are =
split between 2 datanodes =97if i want to take very much advantage of =
multiple datanodes .


Best Regards,
Sindhu


On 25 May 2014, at 21:47, Peyman Mohajerian <mohajeri@gmail.com> wrote:

> Block size are typically 64 M or 12 M, so in your case only a single =
block is involved which means if you have a single replica then only a =
single data node will be used. The default replication is three and =
since you only have two data nodes, you will most likely have two copies =
of the data in two separate data nodes.
>=20
>=20
> On Sun, May 25, 2014 at 12:40 PM, Sindhu Hosamane <sindhuht@gmail.com> =
wrote:
>=20
>>> Hello Friends,=20
>>>=20
>>> I am running  multiple datanodes on a single machine .
>>>=20
>>> The output of jps command shows=20
>>> Namenode       Datanode     Datanode     Jobtracker     tasktracker  =
      Secondary Namenode
>>>=20
>>> Which assures that 2 datanodes are up and running .I execute =
cascalog queries on this 2 datanode hadoop cluster  , And i get the =
results of query too.
>>> I am not sure if it is really using both datanodes . ( bcoz anyways =
i get results with one datanode )
>>>=20
>>> (read somewhere about HDFS storing data in datanodes like below )
>>> 1)  A HDFS scheme might automatically move data from one DataNode to =
another if the free space on a DataNode falls below a certain threshold.=20=

>>> 2)  Internally, a file is split into one or more blocks and these =
blocks are stored in a set of DataNodes.=20
>>>=20
>>> My doubts are :
>>> * Do i have to make any configuration changes in hadoop to tell it =
to share datablocks between 2 datanodes or does it do automatically .
>>> * Also My test data is not too big . its only 240 KB . According to =
point 1) i don't know if such small test data can initiate automatic =
movement of  data from one datanode to another .
>>> * Also what should dfs.replication  value be when i am running 2 =
datanodes  ?  (i guess its 2 )
>>>=20
>>>=20
>>> Any advice or help would be very much appreciated .
>>>=20
>>> Best Regards,
>>> Sindhu
>>=20
>>=20
>=20
>=20


--Apple-Mail=_F8E2BDEB-16FA-44FC-99DB-FD55236F224F
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=windows-1252

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Dwindows-1252"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: =
after-white-space;"><div><br></div><div>ok .thanks for that information =
.&nbsp;</div><div>As i said i am running &nbsp;2 datanodes on same =
machine . so my haddop home has 2 conf folders .</div><div>conf and =
conf2 &nbsp;and in turn 2 hdfs-site.xml in both conf folders =
.</div><div>I guess dfs.replication value in hdfs-site.xml of conf =
folder should be 3 .</div><div>What should i have it in conf2 &nbsp;? =
should it be 1 there ?</div><div><br></div><div>sorry if question sounds =
stupid . But i unfamiliar with these kind of settings ( 2 datanodes on =
same machine ..so having 2 conf =
)</div><div><br></div><div><br></div><div>&nbsp;If data is split across =
multiple datanodes , then processing capacity would be improved - ( =
thats what i guess ) since my file is only 240 KB , it occupies only one =
block . It cannot use second block and remain in another datanode =
.&nbsp;</div><div>So now , does it make sense to reduce the block size =
so that blocks are split between 2 datanodes =97if i want to take very =
much advantage of multiple datanodes =
.</div><div><br></div><div><br></div><div>Best =
Regards,</div><div>Sindhu</div><div><br></div><br><div><div>On 25 May =
2014, at 21:47, Peyman Mohajerian &lt;<a =
href=3D"mailto:mohajeri@gmail.com">mohajeri@gmail.com</a>&gt; =
wrote:</div><br class=3D"Apple-interchange-newline"><blockquote =
type=3D"cite"><div dir=3D"ltr">Block size are typically 64 M or 12 M, so =
in your case only a single block is involved which means if you have a =
single replica then only a single data node will be used. The default =
replication is three and since you only have two data nodes, you will =
most likely have two copies of the data in two separate data nodes.<br>
</div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On =
Sun, May 25, 2014 at 12:40 PM, Sindhu Hosamane <span dir=3D"ltr">&lt;<a =
href=3D"mailto:sindhuht@gmail.com" =
target=3D"_blank">sindhuht@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex"><div =
class=3D"HOEnZb"><div class=3D"h5"><div =
style=3D"word-wrap:break-word"><br><div><blockquote type=3D"cite"><div =
style=3D"word-wrap:break-word">
<blockquote type=3D"cite"><div style=3D"word-wrap:break-word"><div =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;color:r=
gb(34,34,34);font-family:Arial,Helvetica,sans-serif;font-size:13px;backgro=
und-color:rgb(255,255,255)">
Hello Friends,&nbsp;</div><div =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;color:r=
gb(34,34,34);font-family:Arial,Helvetica,sans-serif;font-size:13px;backgro=
und-color:rgb(255,255,255)"><br></div><span =
style=3D"color:rgb(34,34,34);font-family:Arial,Helvetica,sans-serif;font-s=
ize:13px;background-color:rgb(255,255,255)">I am running &nbsp;multiple =
datanodes on a single machine .</span><div =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;color:r=
gb(34,34,34);font-family:Arial,Helvetica,sans-serif;font-size:13px;backgro=
und-color:rgb(255,255,255)">
<br><div =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline">The =
output of jps command shows&nbsp;</div><div =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline">Nameno=
de &nbsp; &nbsp; &nbsp; Datanode &nbsp; &nbsp; Datanode &nbsp; &nbsp; =
Jobtracker &nbsp; &nbsp; tasktracker &nbsp; &nbsp; &nbsp; =
&nbsp;Secondary Namenode</div>
<div =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline"><br></=
div><div =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline">Which =
assures that 2 datanodes are up and running .I execute cascalog queries =
on this 2 datanode hadoop cluster &nbsp;, And i get the results of query =
too.</div>
</div><div =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;color:r=
gb(34,34,34);font-family:Arial,Helvetica,sans-serif;font-size:13px;backgro=
und-color:rgb(255,255,255)">I am not sure if it is really using both =
datanodes . ( bcoz anyways i get results with one datanode )</div>
<div =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;color:r=
gb(34,34,34);font-family:Arial,Helvetica,sans-serif;font-size:13px;backgro=
und-color:rgb(255,255,255)"><br></div><div =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;color:r=
gb(34,34,34);font-family:Arial,Helvetica,sans-serif;font-size:13px;backgro=
und-color:rgb(255,255,255)">
(read somewhere about HDFS storing data in datanodes like below =
)</div><div =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;color:r=
gb(34,34,34);font-family:Arial,Helvetica,sans-serif;font-size:13px;backgro=
und-color:rgb(255,255,255)">
1) &nbsp;<span =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;font-fa=
mily:Verdana,Helvetica,sans-serif;font-size:12.499999046325684px;line-heig=
ht:14.583333015441895px">A HDFS scheme might automatically move data =
from one DataNode to another if the free space on a DataNode falls below =
a certain threshold.</span><span =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;font-fa=
mily:Verdana,Helvetica,sans-serif;font-size:12.499999046325684px;line-heig=
ht:14.583333015441895px">&nbsp;</span></div>
<div =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;color:r=
gb(34,34,34);font-family:Arial,Helvetica,sans-serif;font-size:13px;backgro=
und-color:rgb(255,255,255)"><span =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;font-fa=
mily:Verdana,Helvetica,sans-serif;font-size:12.499999046325684px;line-heig=
ht:14.583333015441895px">2) &nbsp;</span><span =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;font-fa=
mily:Verdana,Helvetica,sans-serif;font-size:12.499999046325684px;line-heig=
ht:14.583333015441895px">Internally, a file is split into one or more =
blocks and these blocks are stored in a set of DataNodes.</span><span =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;font-fa=
mily:Verdana,Helvetica,sans-serif;font-size:12.499999046325684px;line-heig=
ht:14.583333015441895px">&nbsp;</span></div>
<div =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;color:r=
gb(34,34,34);font-family:Arial,Helvetica,sans-serif;font-size:13px;backgro=
und-color:rgb(255,255,255)"><span =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;font-fa=
mily:Verdana,Helvetica,sans-serif;font-size:12.499999046325684px;line-heig=
ht:14.583333015441895px"><br>
</span></div><div =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;color:r=
gb(34,34,34);font-family:Arial,Helvetica,sans-serif;font-size:13px;backgro=
und-color:rgb(255,255,255)"><span =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;font-fa=
mily:Verdana,Helvetica,sans-serif;font-size:12.499999046325684px;line-heig=
ht:14.583333015441895px">My doubts are :</span></div>
<div =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;color:r=
gb(34,34,34);font-family:Arial,Helvetica,sans-serif;font-size:13px;backgro=
und-color:rgb(255,255,255)"><font face=3D"Verdana, Helvetica, =
sans-serif" =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline"><span =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;font-si=
ze:12px;line-height:14px">* Do i have to make any configuration changes =
in hadoop to tell it to share datablocks between 2 datanodes or does it =
do automatically .</span></font></div>
<div =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;color:r=
gb(34,34,34);font-family:Arial,Helvetica,sans-serif;font-size:13px;backgro=
und-color:rgb(255,255,255)"><font face=3D"Verdana, Helvetica, =
sans-serif" =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline"><span =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;font-si=
ze:12px;line-height:14px">* Also My test data is not too big . its only =
240 KB . According to point 1) i&nbsp;don't know if such&nbsp;small test =
data can initiate automatic movement of &nbsp;data from one datanode to =
another .</span></font></div>
<div =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;color:r=
gb(34,34,34);font-family:Arial,Helvetica,sans-serif;font-size:13px;backgro=
und-color:rgb(255,255,255)"><font face=3D"Verdana, Helvetica, =
sans-serif" =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline"><span =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;font-si=
ze:12px;line-height:14px">* Also what&nbsp;should dfs.replication =
&nbsp;value be when i am running 2 datanodes &nbsp;? &nbsp;(i guess its =
2 )</span></font></div>
<div =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;color:r=
gb(34,34,34);font-family:Arial,Helvetica,sans-serif;font-size:13px;backgro=
und-color:rgb(255,255,255)"><font face=3D"Verdana, Helvetica, =
sans-serif" =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline"><span =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;font-si=
ze:12px;line-height:14px"><br>
</span></font></div><div =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;color:r=
gb(34,34,34);font-family:Arial,Helvetica,sans-serif;font-size:13px;backgro=
und-color:rgb(255,255,255)"><span =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;font-fa=
mily:Verdana,Helvetica,sans-serif;font-size:12.499999046325684px;line-heig=
ht:14.583333015441895px"><br>
</span></div><div =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;color:r=
gb(34,34,34);font-family:Arial,Helvetica,sans-serif;font-size:13px;backgro=
und-color:rgb(255,255,255)">Any advice or help would be very much =
appreciated .</div>
<div =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;color:r=
gb(34,34,34);font-family:Arial,Helvetica,sans-serif;font-size:13px;backgro=
und-color:rgb(255,255,255)"><br></div><div =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;color:r=
gb(34,34,34);font-family:Arial,Helvetica,sans-serif;font-size:13px;backgro=
und-color:rgb(255,255,255)">
Best Regards,</div><div =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;color:r=
gb(34,34,34);font-family:Arial,Helvetica,sans-serif;font-size:13px;backgro=
und-color:rgb(255,255,255)">Sindhu</div></div></blockquote>
<div><div style=3D"word-wrap:break-word"><div =
style=3D"margin:0px;padding:0px;border:0px;vertical-align:baseline;color:r=
gb(34,34,34);font-family:Arial,Helvetica,sans-serif;font-size:13px;backgro=
und-color:rgb(255,255,255)"><br>
=
</div></div></div><br></div></blockquote></div><br></div></div></div></blo=
ckquote></div><br></div>
</blockquote></div><br></body></html>=

--Apple-Mail=_F8E2BDEB-16FA-44FC-99DB-FD55236F224F--