hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Lilley <john.lil...@redpoint.net>
Subject RE: intermediate results files
Date Mon, 01 Jul 2013 23:09:52 GMT
I've seen some benchmarks where replication=1 runs at about 50MB/sec and replication=3 runs
at about 33MB/sec, but I can't seem to find that now.

From: Mohammad Tariq [mailto:dontariq@gmail.com]
Sent: Monday, July 01, 2013 5:03 PM
To: user@hadoop.apache.org
Subject: Re: intermediate results files

Hello John,

      IMHO, it doesn't matter. Your job will write the result just once. Replica creation
is handled at the HDFS layer so it has nothing to with your job. Your job will still be writing
at the same speed.

Warm Regards,

On Tue, Jul 2, 2013 at 4:16 AM, John Lilley <john.lilley@redpoint.net<mailto:john.lilley@redpoint.net>>
If my reducers are going to create results that are temporary in nature (consumed by the next
processing stage) is it recommended to use a replication factor <3 to improve performance?

View raw message