hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fengyun RAO <raofeng...@gmail.com>
Subject Re: recommended block replication for small cluster
Date Fri, 04 Apr 2014 02:47:13 GMT
thanks, Peyman!

I know it's configurable, what I don't know is if it's typical to reduce it
in small cluster,

or are there any recommended setting, such as 2 for 10-node cluster, 3 for
100-node, 4 for 1000-node?
or no matter how big the cluster is, just set it to 3.



2014-04-03 21:13 GMT+08:00 Peyman Mohajerian <mohajeri@gmail.com>:

> The reason for replication also has to do with data locality in a larger
> cluster for running a map-reduce jobs. You can reduce the replication,
> that's why it's a configurable parameter.
>
>
> On Thu, Apr 3, 2014 at 7:10 AM, Fengyun RAO <raofengyun@gmail.com> wrote:
>
>> I know the default replication is 3, which ensures reliability when 2
>> nodes crash at the same time.
>>
>> However, for a small cluster, e.g. 10~20 nodes, the possibility that 2
>> nodes crash at the same time is too small.
>>
>> Can we simply set the replication to 2, or are there any other defects?
>>
>> any information are appreciated!
>>
>
>

Mime
View raw message