hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammad Tariq <donta...@gmail.com>
Subject Re: intermediate results files
Date Tue, 02 Jul 2013 00:34:31 GMT
I see. This difference is because of the fact that the next block of data
will not be written to HDFS until the previous block was successfully
written to 'all' the DNs selected for replication. This implies that higher
RF means more time for the completion of a block write.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Tue, Jul 2, 2013 at 4:39 AM, John Lilley <john.lilley@redpoint.net>wrote:

>  I’ve seen some benchmarks where replication=1 runs at about 50MB/sec and
> replication=3 runs at about 33MB/sec, but I can’t seem to find that now.**
> **
>
> John****
>
> ** **
>
> *From:* Mohammad Tariq [mailto:dontariq@gmail.com]
> *Sent:* Monday, July 01, 2013 5:03 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: intermediate results files****
>
> ** **
>
> Hello John,****
>
> ** **
>
>       IMHO, it doesn't matter. Your job will write the result just once.
> Replica creation is handled at the HDFS layer so it has nothing to with
> your job. Your job will still be writing at the same speed.****
>
>
> ****
>
> Warm Regards,****
>
> Tariq****
>
> cloudfront.blogspot.com****
>
> ** **
>
> On Tue, Jul 2, 2013 at 4:16 AM, John Lilley <john.lilley@redpoint.net>
> wrote:****
>
> If my reducers are going to create results that are temporary in nature
> (consumed by the next processing stage) is it recommended to use a
> replication factor <3 to improve performance?  ****
>
> Thanks****
>
> john****
>
>  ****
>
> ** **
>

Mime
View raw message