hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gurmukh Singh <gurmukh.dhil...@yahoo.com.INVALID>
Subject Re: HDFS Block placement policy
Date Sun, 22 May 2016 11:43:07 GMT
the best practice is to have an Edge/Gateway node, so the there is no 
local copy of data. It is also good from a security perspective.

I think my this video can help you understand this better: 


On 20/05/16 12:29 AM, Ruhua Jiang wrote:
> Hi all,
> I have a question related to HDFS Block placement policy. The default,
> "The default block placement policy is as follows: Place the first 
> replica somewhere – either a random node (if the HDFS client is 
> outside the Hadoop/DataNode cluster) or on the local node (if the HDFS 
> client is running on a node inside the cluster). Place the second 
> replica in a different rack"
> Let's consider the situation that data are in *1 datanode local disk*, 
> a *hdfs -put* command is used (which means HDFS client is on a 
> datanode) to ingest this data into HDFS.
> - What will happen (in terms of block placement) if this datanode 
> local disk is full?
> - Is there a list of available alternative block placement policy 
> implemented, and hdfs -put can use it just by change the hdfs-site.xml 
>  config?  I notice this https://issues.apache.org/jira/browse/HDFS-385 
> JIRA ticket but it seems not what we want.
> - I understand place first block on local machine can improve the 
> perfermance, and  we can use HDFS balancer to solve the imblance 
> problem afterwards. However, I just want to explore alternative 
> solutions to avoid this problem at beginning.
> Thanks
> Ruhua Jiang

Thanks and Regards

Gurmukh Singh

View raw message