hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Torsten Curdt <tcu...@apache.org>
Subject Re: still getting "is valid, and cannot be written to"
Date Thu, 06 Sep 2007 10:25:22 GMT
We are still seeing bunch of these. Even with a reduced submit  
replication.
Are we the only ones seeing those? If not I'd be running off filing a  
bug.

cheers
--
Torsten

On 30.08.2007, at 19:47, Hairong Kuang wrote:

> Namenode does not schedule a block to a datanode that is confirmed  
> to hold a
> replica of the block. But it is not aware of any in-transit block  
> placement
> (i.e. the scheduled but not confirmed block placement), so  
> occasionally we
> may still see "is valid, and cannot be written to" errors.
>
> A fix to the problem is to keep track of all in-transit block  
> placements,
> and the block placement algorithm considers these to-be-confirmed  
> replicas
> as well.
>
> Hairong
>
> -----Original Message-----
> From: Doug Cutting [mailto:cutting@apache.org]
> Sent: Thursday, August 30, 2007 10:28 AM
> To: hadoop-dev@lucene.apache.org
> Subject: Re: still getting "is valid, and cannot be written to"
>
> Raghu Angadi wrote:
>> Torsten Curdt wrote:
>>> I just checked our mapred.submit.replication and it is higher than
>>> the nodes in the cluster - maybe that's the problem?
>>
>> This pretty much assures at least a few of these exceptions.
>
> So we have a workaround: lower mapred.submit.replication.  And it's  
> arguably
> not a bug, but just a misfeature, since it only causes spurious  
> warnings.
>
> One fix might be to try to determine mapred.submit.replication  
> based on the
> cluster size.  But that was contentious when that feature was  
> added, and I'd
> rather not re-open that argument again now.
>
>> You can argue that Namenode should not schedule a block to a node
>> twice.. and I agree.
>
> That sounds like a good thing to fix.  Should we file a bug?
>
> Doug
>


Mime
View raw message