hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravi Prakash <ravi...@ymail.com>
Subject Re: intermediate results files
Date Tue, 02 Jul 2013 15:48:14 GMT
Hi John!
If your block is going to be replicated to three nodes, then in the default block placement
policy, 2 of them will be on the same rack, and a third one will be on a different rack. Depending
on the network bandwidths available intra-rack and inter-rack, writing with replication factor=3
may be almost as fast or (more likely) slower. With replication factor=2, the default block
placement is to place them on different racks, so you wouldn't gain much. So you can 
1. Either choose replication factor = 1
2. Change the block placement policy such that even with replication factor=2, it will choose
two nodes in the same rack.


 From: Devaraj k <devaraj.k@huawei.com>
To: "user@hadoop.apache.org" <user@hadoop.apache.org> 
Sent: Tuesday, July 2, 2013 1:00 AM
Subject: RE: intermediate results files

If you are 100% sure that all the node data nodes are available and healthy for that period
of time, you can choose the replication factor as 1 or <3.
Devaraj k
From:John Lilley [mailto:john.lilley@redpoint.net] 
Sent: 02 July 2013 04:40
To: user@hadoop.apache.org
Subject: RE: intermediate results files
I’ve seen some benchmarks where replication=1 runs at about 50MB/sec and replication=3 runs
at about 33MB/sec, but I can’t seem to find that now.
From:Mohammad Tariq [mailto:dontariq@gmail.com] 
Sent: Monday, July 01, 2013 5:03 PM
To: user@hadoop.apache.org
Subject: Re: intermediate results files
Hello John,
      IMHO, it doesn't matter. Your job will write the result just once. Replica creation
is handled at the HDFS layer so it has nothing to with your job. Your job will still be writing
at the same speed.

Warm Regards,
On Tue, Jul 2, 2013 at 4:16 AM, John Lilley <john.lilley@redpoint.net> wrote:
If my reducers are going to create results that are temporary in nature (consumed by the next
processing stage) is it recommended to use a replication factor <3 to improve performance? 

View raw message