hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ruhua Jiang <ruhua.ji...@gmail.com>
Subject HDFS Block placement policy
Date Thu, 19 May 2016 14:29:56 GMT
Hi all,

I have a question related to HDFS Block placement policy.  The default,
"The default block placement policy is as follows: Place the first replica
somewhere – either a random node (if the HDFS client is outside the
Hadoop/DataNode cluster) or on the local node (if the HDFS client is
running on a node inside the cluster). Place the second replica in a
different rack"

Let's consider the situation that data are in *1 datanode local disk*, a *hdfs
-put* command is used (which means HDFS client is on a datanode) to ingest
this data into HDFS.
- What will happen (in terms of block placement) if this datanode local
disk is full?
- Is there a list of available alternative block placement policy
implemented, and hdfs -put can use it just by change the hdfs-site.xml
 config?  I notice this https://issues.apache.org/jira/browse/HDFS-385 JIRA
ticket but it seems not what we want.
- I understand place first block on local machine can improve the
perfermance, and  we can use HDFS balancer to solve the imblance problem
afterwards. However, I just want to explore alternative solutions to avoid
this problem at beginning.


Thanks

Ruhua Jiang

Mime
View raw message