hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stu24m...@yahoo.com
Subject Re: how does hdfs determine what node to use?
Date Thu, 10 Mar 2011 18:19:09 GMT
Actually I just meant to point out however many copies you have, the copies are placed on different
nodes. Although if you only have two nodes, there aren't a whole lot of options.. :)

I thought Rita was mainly worried if they all went to the same node - which would be bad.

Take care,
 -stu

-----Original Message-----
From: Ayon Sinha <ayonsinha@yahoo.com>
Date: Thu, 10 Mar 2011 07:41:17 
To: <hdfs-user@hadoop.apache.org>
Reply-To: hdfs-user@hadoop.apache.org
Subject: Re: how does hdfs determine what node to use?

I think Stu meant that each block will have a copy on at most 2 nodes. 
Before Hadoop .20 rack awareness was not built-in the algo to pick the 
replication node. With .20 and later, the rack awareness does the following:
1. First copy of the block is picked at "random" from one of the least loaded 
nodes. Then the next copy is picked to be on another node on the same rack (to 
save network hops). 
2. Then if the rep factor is 3, it will pick another node from another rack. 
This is done to provide redundancy in case an entire rack is unavailable due to 
switch failure.

So I am guessing if you have a rep factor of 2, both the blocks will be on the 
same rack. Its quite possible that Hadoop has some switch somewhere to change 
this policy, because Hadoop has a switch for everything.
 -Ayon




________________________________
From: Rita <rmorgan466@gmail.com>
To: hdfs-user@hadoop.apache.org; stu24mail@yahoo.com
Sent: Thu, March 10, 2011 5:37:08 AM
Subject: Re: how does hdfs determine what node to use?

Thanks Stu. I too was sure there was an algorithm. Is there a place where I can 
read more about it?  I want to know if it picks a block according to the load 
average or does it always pick "rack0" first? 





On Wed, Mar 9, 2011 at 10:24 PM, <stu24mail@yahoo.com> wrote:

There is an algorithm. Each block should have a copy on different nodes. In your 
case, each block will have a copy on each of the nodes.
>
>Take care,
>-stu
________________________________

>From:  Rita <rmorgan466@gmail.com> 
>Date: Wed, 9 Mar 2011 22:07:37 -0500
>To: <hdfs-user@hadoop.apache.org>
>ReplyTo:  hdfs-user@hadoop.apache.org 
>Subject: how does hdfs determine what node to use?
>
>I have a 2 rack cluster. All of my files have a replication factor of 2. How 
>does hdfs determine what node to use when serving the data? Does it always use 
>the first rack? or is there an algorithm for this?
>
>
>-- 
>--- Get your facts first, then you can distort them as you please.--
>


-- 
--- Get your facts first, then you can distort them as you please.--



      
Mime
View raw message