Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 47275 invoked from network); 6 Sep 2007 10:25:50 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Sep 2007 10:25:50 -0000 Received: (qmail 98397 invoked by uid 500); 6 Sep 2007 10:25:45 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 98055 invoked by uid 500); 6 Sep 2007 10:25:44 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 98033 invoked by uid 99); 6 Sep 2007 10:25:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Sep 2007 03:25:44 -0700 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [208.75.86.161] (HELO vafer.org) (208.75.86.161) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Sep 2007 10:25:40 +0000 Received: from dslb-084-058-045-003.pools.arcor-ip.net ([84.58.45.3]:32891 helo=[10.0.1.4]) by vafer.org with esmtpsa (TLS-1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.62) (envelope-from ) id 1ITEY3-0006OF-9X for hadoop-dev@lucene.apache.org; Thu, 06 Sep 2007 10:25:20 +0000 Mime-Version: 1.0 (Apple Message framework v752.3) In-Reply-To: <003401c7eb2d$d267c460$6501a8c0@ds.corp.yahoo.com> References: <81EFDBC6-A767-4B39-8102-3656046071E5@joost.com> <46D4643F.1030509@yahoo-inc.com> <46D5E476.3020007@apache.org> <9752D7EC-5D60-4C9F-972F-F5C659B37563@apache.org> <46D5F1B6.1030502@yahoo-inc.com> <46D6FE14.7040401@apache.org> <003401c7eb2d$d267c460$6501a8c0@ds.corp.yahoo.com> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <8207A38D-B995-415B-8288-4DA1E98AA049@apache.org> Content-Transfer-Encoding: 7bit From: Torsten Curdt Subject: Re: still getting "is valid, and cannot be written to" Date: Thu, 6 Sep 2007 12:25:22 +0200 To: hadoop-dev@lucene.apache.org X-Mailer: Apple Mail (2.752.3) X-Virus-Checked: Checked by ClamAV on apache.org We are still seeing bunch of these. Even with a reduced submit replication. Are we the only ones seeing those? If not I'd be running off filing a bug. cheers -- Torsten On 30.08.2007, at 19:47, Hairong Kuang wrote: > Namenode does not schedule a block to a datanode that is confirmed > to hold a > replica of the block. But it is not aware of any in-transit block > placement > (i.e. the scheduled but not confirmed block placement), so > occasionally we > may still see "is valid, and cannot be written to" errors. > > A fix to the problem is to keep track of all in-transit block > placements, > and the block placement algorithm considers these to-be-confirmed > replicas > as well. > > Hairong > > -----Original Message----- > From: Doug Cutting [mailto:cutting@apache.org] > Sent: Thursday, August 30, 2007 10:28 AM > To: hadoop-dev@lucene.apache.org > Subject: Re: still getting "is valid, and cannot be written to" > > Raghu Angadi wrote: >> Torsten Curdt wrote: >>> I just checked our mapred.submit.replication and it is higher than >>> the nodes in the cluster - maybe that's the problem? >> >> This pretty much assures at least a few of these exceptions. > > So we have a workaround: lower mapred.submit.replication. And it's > arguably > not a bug, but just a misfeature, since it only causes spurious > warnings. > > One fix might be to try to determine mapred.submit.replication > based on the > cluster size. But that was contentious when that feature was > added, and I'd > rather not re-open that argument again now. > >> You can argue that Namenode should not schedule a block to a node >> twice.. and I agree. > > That sounds like a good thing to fix. Should we file a bug? > > Doug >