hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Carey <sc...@richrelevance.com>
Subject Re: should data be evenly distributed to each (physical) node
Date Thu, 04 Mar 2010 18:17:47 GMT
Hadoop's allocation policy is:

If the client writing is not also a data node, place the blocks randomly.  If there are multiple
racks, all replica blocks cannot be on the same rack.
If the client writing is a data node itself, place the first block on that node, and replica
blocks elsewhere.  If there are multiple racks, all replicas cannot be on the same rack.

I assume your problem is that you are running "dfs -put" from a datanode, and that the replication
factor is 1.  In that case it is expected that all the data is on the server you submitted
it from.  You might want to set the replication factor to 2.  You might also want to submit
the data from somewhere else.

On Mar 4, 2010, at 7:25 AM, openresearch wrote:

> I am building a small two node cluster following
> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
> Every thing seems to be working, except I notice the data are NOT evenly
> distributed to each physical box.
> e.g., when I hadoop dfs -put <6G> data. I am expecting ~3G on each node
> (take turns every ~64MB), however, I checked dfshealth.jsp and "du -k" on
> local box, and found the uploaded data are ONLY residing on the physical box
> where I start "dfs -put". That defeats the whole (data locality) purpose of
> hadoop?!
> Please help.
> Thanks
> -- 
> View this message in context: http://old.nabble.com/should-data-be-evenly-distributed-to-each-%28physical%29-node-tp27782215p27782215.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.

View raw message