hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bharath vissapragada <bharathvissapragada1...@gmail.com>
Subject Re: Understanding harpoon - help needed
Date Wed, 23 Jan 2013 09:44:32 GMT

Link [1] partly answers your question. Namenode chooses the "nearest"
data-node that can cater this request. So replication definitely helps, in
the sense that a replica might be placed on a node nearer to the client.
I'm not sure whether the namenode checks if a datanode is busy serving
other requests, So I'll leave that for others to answer.

[1] http://hadoop.apache.org/docs/r0.20.2/hdfs_design.html#Replica+Selection


On Wed, Jan 23, 2013 at 2:54 PM, Dibyendu Karmakar

> Hi,
> I am doing some performance testing in HADOOP. But while testing, I faced
> a situation. I need your help.
> My HADOOP cluster :
> 6 Datanodes, 1 Namenode, 2 Clients.
> Replication factor = 3
> 2 clients write a file(put operation) whose size is 2 x block size.
> DFS.DATA.DIR in each Datanodes is equal and is same as block size. That
> means each Datanodes stores a single block.
> Now, if 2 clients simultaneously reads the file( get operation),
> Will 2 clients read 2 blocks from different Datanodes ?
> Or they will read from the same datanodes?
> Does Namenode know which Datanode is busy and which one is idle?
> What I am trying to find is that...
> Is it possible to decrease the read time by increasing replication factor?
> I have attached an image to better understand my question. Kindly take a
> look. Thank you. And if possible please give references.

View raw message