Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <286066.53661.qm@web110105.mail.gq1.yahoo.com>
References: <45f85f70910091044mc9f2a26sa1bbea152f443946@mail.gmail.com>
	<201857.67894.qm@web110101.mail.gq1.yahoo.com>
 <286066.53661.qm@web110105.mail.gq1.yahoo.com>
From: Todd Lipcon <todd@cloudera.com>
Date: Fri, 4 Dec 2009 08:35:56 -0800
Message-ID: <45f85f70912040835g532631f9x18b98157fa56c155@mail.gmail.com>
Subject: Re: DFSClient write error when DN down
To: common-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=00504502ae02e5dc8b0479e9b4a5

--00504502ae02e5dc8b0479e9b4a5
Content-Type: text/plain; charset=ISO-8859-1

Hi Arvind,

Looks to me like you've identified the JIRAs that are causing this.
Hopefully they will be fixed soon.

-Todd

On Fri, Dec 4, 2009 at 4:43 AM, Arvind Sharma <arvind321@yahoo.com> wrote:

> Any suggestions would be welcome :-)
>
> Arvind
>
>
>
>
>
>
> ________________________________
> From: Arvind Sharma <arvind321@yahoo.com>
> To: common-user@hadoop.apache.org
> Sent: Wed, December 2, 2009 8:02:39 AM
> Subject: DFSClient write error when DN down
>
>
>
> I have seen similar error logs in the Hadoop Jira (Hadoop-2691, HDFS-795 )
> but not sure this one is exactly the same scenario.
>
> Hadoop - 0.19.2
>
> The client side DFSClient fails to write when few of the DN in a grid goes
> down.  I see this error :
>
> ***************************
>
> 2009-11-13 13:45:27,815 WARN DFSClient | DFSOutputStream
> ResponseProcessor exception for block
> blk_3028932254678171367_1462691java.io.IOException: Bad response 1 for
> block blk_30289322
> 54678171367_1462691 from datanode 10.201.9.225:50010
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2341)
> 2009-11-13 13:45:27,815 WARN             DFSClient |  Error Recovery for
> block blk_3028932254678171367_1462691 bad datanode[2] 10.201.9.225:50010
> 2009-11-13 13:45:27,815 WARN DFSClient | Error Recovery for block
> blk_3028932254678171367_1462691 in pipeline 10.201.9.218:50010,
> 10.201.9.220:50010, 10.201.9.225:50010: bad datanode 10
> ...201.9.225:50010
> 2009-11-13 13:45:37,433 WARN DFSClient | DFSOutputStream
> ResponseProcessor exception for block
> blk_-6619123912237837733_1462799java.io.IOException: Bad response 1 for
> block blk_-661912
> 3912237837733_1462799 from datanode 10.201.9.225:50010
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2341)2009-11-13
> 13:45:37,433 WARN             DFSClient |  Error Recovery for block
> blk_-6619123912237837733_1462799 bad datanode[1] 10.201.9.225:50010
> 2009-11-13 13:45:37,433 WARN DFSClient | Error Recovery for block
> blk_-6619123912237837733_1462799 in pipeline 10.201.9.218:50010,
> 10.201.9.225:50010: bad datanode 10.201.9.225:50010
>
>
> ***************************
>
> The only way I could get my client program to write successfully to the DFS
> was to re-start it.
>
> Any suggestions how to get around this problem on the client side ?  As I
> understood, the DFSClient APIs will take care of situations like this and
> the clients don't need to worry about if some of the DN goes down.
>
> Also, the replication factor is 3 in my setup and there are 10 DN (out of
> which TWO went down)
>
>
> Thanks!
> Arvind
>
>
>
>

--00504502ae02e5dc8b0479e9b4a5--