Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 36480 invoked from network); 4 Dec 2009 16:36:42 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 4 Dec 2009 16:36:42 -0000 Received: (qmail 53390 invoked by uid 500); 4 Dec 2009 16:36:39 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 53292 invoked by uid 500); 4 Dec 2009 16:36:38 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 53282 invoked by uid 99); 4 Dec 2009 16:36:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Dec 2009 16:36:38 +0000 X-ASF-Spam-Status: No, hits=-2.5 required=5.0 tests=AWL,BAYES_00,HTML_MESSAGE,NORMAL_HTTP_TO_IP,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [209.85.216.171] (HELO mail-px0-f171.google.com) (209.85.216.171) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Dec 2009 16:36:36 +0000 Received: by pxi1 with SMTP id 1so548959pxi.29 for ; Fri, 04 Dec 2009 08:36:16 -0800 (PST) MIME-Version: 1.0 Received: by 10.142.2.17 with SMTP id 17mr411950wfb.98.1259944576146; Fri, 04 Dec 2009 08:36:16 -0800 (PST) In-Reply-To: <286066.53661.qm@web110105.mail.gq1.yahoo.com> References: <45f85f70910091044mc9f2a26sa1bbea152f443946@mail.gmail.com> <201857.67894.qm@web110101.mail.gq1.yahoo.com> <286066.53661.qm@web110105.mail.gq1.yahoo.com> From: Todd Lipcon Date: Fri, 4 Dec 2009 08:35:56 -0800 Message-ID: <45f85f70912040835g532631f9x18b98157fa56c155@mail.gmail.com> Subject: Re: DFSClient write error when DN down To: common-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=00504502ae02e5dc8b0479e9b4a5 --00504502ae02e5dc8b0479e9b4a5 Content-Type: text/plain; charset=ISO-8859-1 Hi Arvind, Looks to me like you've identified the JIRAs that are causing this. Hopefully they will be fixed soon. -Todd On Fri, Dec 4, 2009 at 4:43 AM, Arvind Sharma wrote: > Any suggestions would be welcome :-) > > Arvind > > > > > > > ________________________________ > From: Arvind Sharma > To: common-user@hadoop.apache.org > Sent: Wed, December 2, 2009 8:02:39 AM > Subject: DFSClient write error when DN down > > > > I have seen similar error logs in the Hadoop Jira (Hadoop-2691, HDFS-795 ) > but not sure this one is exactly the same scenario. > > Hadoop - 0.19.2 > > The client side DFSClient fails to write when few of the DN in a grid goes > down. I see this error : > > *************************** > > 2009-11-13 13:45:27,815 WARN DFSClient | DFSOutputStream > ResponseProcessor exception for block > blk_3028932254678171367_1462691java.io.IOException: Bad response 1 for > block blk_30289322 > 54678171367_1462691 from datanode 10.201.9.225:50010 > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2341) > 2009-11-13 13:45:27,815 WARN DFSClient | Error Recovery for > block blk_3028932254678171367_1462691 bad datanode[2] 10.201.9.225:50010 > 2009-11-13 13:45:27,815 WARN DFSClient | Error Recovery for block > blk_3028932254678171367_1462691 in pipeline 10.201.9.218:50010, > 10.201.9.220:50010, 10.201.9.225:50010: bad datanode 10 > ...201.9.225:50010 > 2009-11-13 13:45:37,433 WARN DFSClient | DFSOutputStream > ResponseProcessor exception for block > blk_-6619123912237837733_1462799java.io.IOException: Bad response 1 for > block blk_-661912 > 3912237837733_1462799 from datanode 10.201.9.225:50010 > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2341)2009-11-13 > 13:45:37,433 WARN DFSClient | Error Recovery for block > blk_-6619123912237837733_1462799 bad datanode[1] 10.201.9.225:50010 > 2009-11-13 13:45:37,433 WARN DFSClient | Error Recovery for block > blk_-6619123912237837733_1462799 in pipeline 10.201.9.218:50010, > 10.201.9.225:50010: bad datanode 10.201.9.225:50010 > > > *************************** > > The only way I could get my client program to write successfully to the DFS > was to re-start it. > > Any suggestions how to get around this problem on the client side ? As I > understood, the DFSClient APIs will take care of situations like this and > the clients don't need to worry about if some of the DN goes down. > > Also, the replication factor is 3 in my setup and there are 10 DN (out of > which TWO went down) > > > Thanks! > Arvind > > > > --00504502ae02e5dc8b0479e9b4a5--