hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sindhu Hosamane <sindh...@gmail.com>
Subject Re: How to make sure data blocks are shared between 2 datanodes
Date Mon, 26 May 2014 18:28:12 GMT

ok .thanks a lot that information . 
As i said i am running  2 datanodes on same machine . so my haddop home has 2 conf folders
.
conf and conf2  and in turn 2 hdfs-site.xml in both conf folders .
I guess dfs.replication value in hdfs-site.xml of conf folder should be 3 .
What should i have it in conf2  ? should it be 1 there ?

sorry if question sounds stupid . But i am unfamiliar with these kind of settings ( 2 datanodes
on same machine ..so having 2 conf )


 If data is split across multiple datanodes , then processing capacity would be improved -
( thats what i guess ) since my file is only 240 KB , it occupies only one block . It cannot
use second block and remain in another datanode . 
So now , does it make sense to reduce the block size so that blocks are split between 2 datanodes
—if i want to take very much advantage of multiple datanodes .

Any advices ?


Best Regards,
Sindhu



On 25 May 2014, at 21:47, Peyman Mohajerian <mohajeri@gmail.com> wrote:

> Block size are typically 64 M or 12 M, so in your case only a single block is involved
which means if you have a single replica then only a single data node will be used. The default
replication is three and since you only have two data nodes, you will most likely have two
copies of the data in two separate data nodes.
> 
> 
> On Sun, May 25, 2014 at 12:40 PM, Sindhu Hosamane <sindhuht@gmail.com> wrote:
> 
>>> Hello Friends, 
>>> 
>>> I am running  multiple datanodes on a single machine .
>>> 
>>> The output of jps command shows 
>>> Namenode       Datanode     Datanode     Jobtracker     tasktracker        Secondary
Namenode
>>> 
>>> Which assures that 2 datanodes are up and running .I execute cascalog queries
on this 2 datanode hadoop cluster  , And i get the results of query too.
>>> I am not sure if it is really using both datanodes . ( bcoz anyways i get results
with one datanode )
>>> 
>>> (read somewhere about HDFS storing data in datanodes like below )
>>> 1)  A HDFS scheme might automatically move data from one DataNode to another
if the free space on a DataNode falls below a certain threshold. 
>>> 2)  Internally, a file is split into one or more blocks and these blocks are
stored in a set of DataNodes. 
>>> 
>>> My doubts are :
>>> * Do i have to make any configuration changes in hadoop to tell it to share datablocks
between 2 datanodes or does it do automatically .
>>> * Also My test data is not too big . its only 240 KB . According to point 1)
i don't know if such small test data can initiate automatic movement of  data from one datanode
to another .
>>> * Also what should dfs.replication  value be when i am running 2 datanodes  ?
 (i guess its 2 )
>>> 
>>> 
>>> Any advice or help would be very much appreciated .
>>> 
>>> Best Regards,
>>> Sindhu
>> 
>> 
> 
> 


Mime
View raw message