hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daryn Sharp <da...@yahoo-inc.com>
Subject Re: ALL HDFS Blocks on the Same Machine if Replication factor = 1
Date Mon, 10 Jun 2013 13:53:33 GMT
It's normal.  The default placement strategy stores the first block on the same node for performance,
then choses a second random node on another rack, then a third node on the same rack as the
second node.  Using a replication factor of 1 is not advised if you value your data.  However,
if you want a better distribution of blocks with 1 replica then consider using a non-DN host
to upload your files.

Daryn

On Jun 10, 2013, at 8:36 AM, Razen Al Harbi wrote:

> Hello,
> 
> I have deployed Hadoop on a cluster of 20 machines. I set the replication factor to one.
When I put a file (larger than HDFS block size) into HDFS, all the blocks are stored on the
machine where the Hadoop put command is invoked. 
> 
> For higher replication factor, I see the same behavior but the replicated blocks are
stored randomly on all the other machines.
> 
> Is this a normal behavior, if not what would be the cause?
> 
> Thanks, 
> 
> Razen


Mime
View raw message