hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Bockelman <bbock...@cse.unl.edu>
Subject Re: File loss at Nebraska
Date Tue, 09 Dec 2008 19:18:37 GMT

On Dec 9, 2008, at 4:58 PM, Edward Capriolo wrote:

> Also it might be useful to strongly word hadoop-default.conf as many
> people might not know a downside exists for using 2 rather then 3 as
> the replication factor. Before reading this thread I would have
> thought 2 to be sufficient.

I think 2 should be sufficient, but running with 2 replicas instead of  
3 exposes some namenode bugs which are harder to trigger.

For example, let's say your system has 100 nodes and 1M blocks.  Let's  
say a namenode bug affects replica of block X on node Y and the  
namenode doesn't realize it.  Then, there is a 1% chance that when  
another node goes down, the block becomes missing.  If this bug is  
cumulative or affects many blocks (I suspect about 500-1000 blocks are  
problematic out of 1M), you're almost guaranteed to lose data whenever  
a single node goes down.

On the other hand, if you have 1000 block replica problems on the same  
cluster with 3 replicas, in order to lose files, two of the block  
replica problems must be the same block and the node which goes down  
must hold the third block.  The probability of this happening is  
(1e-6) * (1e-6) * (1/100) = 1e-14, or 0.0000000000001%.

So, even assuming that I did all my probability calculations wrong, a  
site running with 2 replicas is more than 10 orders of magnitude more  
likely to discover inconsistencies or other bugs in the name node than  
a site with 3 replicas.

Accordingly, these sites are the "canaries in the coal mine" to  
discover NameNode bugs.

Brian


Mime
View raw message