hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghu Angadi <rang...@yahoo-inc.com>
Subject Re: Could only be replicated to 0 nodes, instead of 1
Date Thu, 21 May 2009 19:24:00 GMT
Brian Bockelman wrote:
> 
> On May 21, 2009, at 2:01 PM, Raghu Angadi wrote:
> 
>>
>> I think you should file a jira on this. Most likely this is what is 
>> happening :
>>
>> * two out of 3 dns can not take anymore blocks.
>> * While picking nodes for a new block, NN mostly skips the third dn as 
>> well since '# active writes' on it is larger than '2 * avg'.
>> * Even if there is one other block is being written on the 3rd, it is 
>> still greater than (2 * 1/3).
>>
>> To test this, if you write just one block to an idle cluster it should 
>> succeed.
>>
>> Writing from the client on the 3rd dn succeeds since local node is 
>> always favored.
>>
>> This particular problem is not that severe on a large cluster but HDFS 
>> should do the sensible thing.
>>
> 
> Hey Raghu,
> 
> If this analysis is right, I would add it can happen even on large 
> clusters!  I've seen this error at our cluster when we're very full 
> (>97%) and very few nodes have any empty space.  This usually happens 
> because we have two very large nodes (10x bigger than the rest of the 
> cluster), and HDFS tends to distribute writes randomly -- meaning the 
> smaller nodes fill up quickly, until the balancer can catch up.

Yes. This would bite when ever a large portion of nodes can not accept 
blocks. In general can happen whenever less than half the nodes have any 
space left.

Raghu.


Mime
View raw message