hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghu Angadi <rang...@yahoo-inc.com>
Subject Re: File loss at Nebraska
Date Tue, 09 Dec 2008 19:31:55 GMT
Brian Bockelman wrote:
> On Dec 9, 2008, at 4:58 PM, Edward Capriolo wrote:
>> Also it might be useful to strongly word hadoop-default.conf as many
>> people might not know a downside exists for using 2 rather then 3 as
>> the replication factor. Before reading this thread I would have
>> thought 2 to be sufficient.
> I think 2 should be sufficient, but running with 2 replicas instead of 3 
> exposes some namenode bugs which are harder to trigger.

Whether 2 is sufficient or not, I completely agree with later part. We 
should treat this as what I think it fundamentally is : fixing Namenode.

I guess lately some of these bugs either got more likely or some similar 
bugs crept in.

Sticking with 3 is a very good advise for maximizing reliability.. but 
from a opportunistic developer point of view a big cluster running with 
replication of 2 is great test case :-).. over all I think is a good 
thing for Hadoop.


View raw message