hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sindhu Hosamane <sindh...@gmail.com>
Subject How to make sure data blocks are shared between 2 datanodes
Date Sun, 25 May 2014 19:40:13 GMT

>> Hello Friends, 
>> 
>> I am running  multiple datanodes on a single machine .
>> 
>> The output of jps command shows 
>> Namenode       Datanode     Datanode     Jobtracker     tasktracker        Secondary
Namenode
>> 
>> Which assures that 2 datanodes are up and running .I execute cascalog queries on
this 2 datanode hadoop cluster  , And i get the results of query too.
>> I am not sure if it is really using both datanodes . ( bcoz anyways i get results
with one datanode )
>> 
>> (read somewhere about HDFS storing data in datanodes like below )
>> 1)  A HDFS scheme might automatically move data from one DataNode to another if the
free space on a DataNode falls below a certain threshold. 
>> 2)  Internally, a file is split into one or more blocks and these blocks are stored
in a set of DataNodes. 
>> 
>> My doubts are :
>> * Do i have to make any configuration changes in hadoop to tell it to share datablocks
between 2 datanodes or does it do automatically .
>> * Also My test data is not too big . its only 240 KB . According to point 1) i don't
know if such small test data can initiate automatic movement of  data from one datanode to
another .
>> * Also what should dfs.replication  value be when i am running 2 datanodes  ?  (i
guess its 2 )
>> 
>> 
>> Any advice or help would be very much appreciated .
>> 
>> Best Regards,
>> Sindhu
> 
> 


Mime
View raw message