hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Bockelman <bbock...@cse.unl.edu>
Subject Re: File loss at Nebraska
Date Tue, 09 Dec 2008 19:38:08 GMT

On Dec 9, 2008, at 5:31 PM, Raghu Angadi wrote:

> Brian Bockelman wrote:
>> On Dec 9, 2008, at 4:58 PM, Edward Capriolo wrote:
>>> Also it might be useful to strongly word hadoop-default.conf as many
>>> people might not know a downside exists for using 2 rather then 3 as
>>> the replication factor. Before reading this thread I would have
>>> thought 2 to be sufficient.
>> I think 2 should be sufficient, but running with 2 replicas instead  
>> of 3 exposes some namenode bugs which are harder to trigger.
> Whether 2 is sufficient or not, I completely agree with later part.  
> We should treat this as what I think it fundamentally is : fixing  
> Namenode.
> I guess lately some of these bugs either got more likely or some  
> similar bugs crept in.
> Sticking with 3 is a very good advise for maximizing reliability..  
> but from a opportunistic developer point of view a big cluster  
> running with replication of 2 is great test case :-).. over all I  
> think is a good thing for Hadoop.

Well, we're most likely here to stay: this is the secondary site for  
most of these files.  As long as we can indeed identify lost files,  
it's fairly automated to retransfer.  The amount of unique files on  
this site is around .1% or less of total, and we plan on setting only  
those to 3 replicas.

So, we'll be happy to provide whatever logs or debugging info is  
needed, as long as someone cares to keep on fixing bugs.


View raw message