hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang" <hair...@yahoo-inc.com>
Subject RE: still getting "is valid, and cannot be written to"
Date Thu, 06 Sep 2007 16:57:46 GMT
Hi Torsten,

We occasionally see this too. But on a small scale cluster, you are more
likely to see this. I filed a jira at
https://issues.apache.org/jira/browse/HADOOP-1845.

Cheers,
Hairong

-----Original Message-----
From: Torsten Curdt [mailto:tcurdt@apache.org] 
Sent: Thursday, September 06, 2007 3:25 AM
To: hadoop-dev@lucene.apache.org
Subject: Re: still getting "is valid, and cannot be written to"

We are still seeing bunch of these. Even with a reduced submit replication.
Are we the only ones seeing those? If not I'd be running off filing a bug.

cheers
--
Torsten

On 30.08.2007, at 19:47, Hairong Kuang wrote:

> Namenode does not schedule a block to a datanode that is confirmed to 
> hold a replica of the block. But it is not aware of any in-transit 
> block placement (i.e. the scheduled but not confirmed block 
> placement), so occasionally we may still see "is valid, and cannot be 
> written to" errors.
>
> A fix to the problem is to keep track of all in-transit block 
> placements, and the block placement algorithm considers these 
> to-be-confirmed replicas as well.
>
> Hairong
>
> -----Original Message-----
> From: Doug Cutting [mailto:cutting@apache.org]
> Sent: Thursday, August 30, 2007 10:28 AM
> To: hadoop-dev@lucene.apache.org
> Subject: Re: still getting "is valid, and cannot be written to"
>
> Raghu Angadi wrote:
>> Torsten Curdt wrote:
>>> I just checked our mapred.submit.replication and it is higher than 
>>> the nodes in the cluster - maybe that's the problem?
>>
>> This pretty much assures at least a few of these exceptions.
>
> So we have a workaround: lower mapred.submit.replication.  And it's 
> arguably not a bug, but just a misfeature, since it only causes 
> spurious warnings.
>
> One fix might be to try to determine mapred.submit.replication based 
> on the cluster size.  But that was contentious when that feature was 
> added, and I'd rather not re-open that argument again now.
>
>> You can argue that Namenode should not schedule a block to a node 
>> twice.. and I agree.
>
> That sounds like a good thing to fix.  Should we file a bug?
>
> Doug
>



Mime
View raw message