hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kai Voigt...@123.org>
Subject Re: How HDFS decides where to put the block
Date Tue, 19 Apr 2011 05:22:23 GMT

I found http://hadoopblog.blogspot.com/2009/09/hdfs-block-replica-placement-in-your.html explains
the process nicely.

The first replica of each block will be stored on the client machine, if it's a datanode itself.
Makes sense, as it doesn't require a network transfer. Otherwise, a random datanode will be
picked for the first replica.

The second replica will be written to a random datanode on a random rack other than the rack
where the first replica is stored. Here, HDFS's rack awareness will be utilized. So HDFS would
survive a rack failure.

The second replica will be written to the same rack as the second replica, but another random
datanode in that rack. That will make the pipeline between second and third replica quick.

Does that make sense to you? However, this is the current hard coded policy, there's ideas
to make that policy customizable (https://issues.apache.org/jira/browse/HDFS-385).


Am 18.04.2011 um 15:46 schrieb Nan Zhu:

> Hi, all
> I'm confused by a question that "how does the HDFS decide where to put the
> data blocks "
> I mean that the user invokes some commands like "./hadoop put ***", we
> assume that this  file consistes of 3 blocks, but how HDFS decides where
> these 3 blocks to be put?
> Most of the materials don't involve this issue, but just introduce the data
> replica where talking about blocks in HDFS,
> can anyone give me some instructions?
> Thanks
> Nan
> -- 
> Nan Zhu
> School of Software,5501
> Shanghai Jiao Tong University
> 800,Dongchuan Road,Shanghai,China
> E-Mail: zhunansjtu@gmail.com

Kai Voigt

View raw message