Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E22AC1020F for ; Fri, 11 Oct 2013 03:18:48 +0000 (UTC) Received: (qmail 73188 invoked by uid 500); 11 Oct 2013 03:18:36 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 72445 invoked by uid 500); 11 Oct 2013 03:18:33 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 72436 invoked by uid 99); 11 Oct 2013 03:18:32 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Oct 2013 03:18:32 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of arinto@gmail.com designates 209.85.217.179 as permitted sender) Received: from [209.85.217.179] (HELO mail-lb0-f179.google.com) (209.85.217.179) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Oct 2013 03:18:28 +0000 Received: by mail-lb0-f179.google.com with SMTP id x18so2939295lbi.10 for ; Thu, 10 Oct 2013 20:18:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=iKHLMCoFNI8qe4Wyi6O0NExfKIkXLzdKDqjNDQXHc/c=; b=oMJzjnSZxH0I+tcGeql3eB4J2jWVtQC32oLc1etj1rkLSgRHM7nic6pbkwpHsoa8X4 4ZrwrJe5DVF3icZebhzSXl+YQUNsYPVm/uHqt1cGPpvWgEehZxV+TIN1mODJsIFyyxu4 KZOBQ+dRxQyGaMTcAv+rtfFprfxZQWpMz0nMSzKYevY5neYbDdxTSsibGj6EQS7uBrgg QbuNIPL0fA8eCSVOb6A8uZ93DHOFFOWqDwE47DfQu4ZjalrTZu9QbNqIvdTJX5CN5FnO Oh0rhZDIoeLPiOcNxWsAjxX2QsfsUh4TH2y6zGUSHsJE+zCyydF0/uX5N4d3U94uwvHW eGkg== X-Received: by 10.152.180.139 with SMTP id do11mr4351519lac.23.1381461486500; Thu, 10 Oct 2013 20:18:06 -0700 (PDT) MIME-Version: 1.0 Received: by 10.112.14.73 with HTTP; Thu, 10 Oct 2013 20:17:46 -0700 (PDT) In-Reply-To: <1542FA4EE20C5048A5C2A3663BED2A6B319C90FE@szxeml561-mbx.china.huawei.com> References: <1542FA4EE20C5048A5C2A3663BED2A6B319C90FE@szxeml561-mbx.china.huawei.com> From: Arinto Murdopo Date: Fri, 11 Oct 2013 11:17:46 +0800 Message-ID: Subject: Re: Intermittent DataStreamer Exception while appending to file inside HDFS To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a1135eb6c2c7ff904e86e91eb X-Virus-Checked: Checked by ClamAV on apache.org --001a1135eb6c2c7ff904e86e91eb Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thank you for the comprehensive answer, When I inspect our NameNode UI, I see there are 3 datanodes are up. However, as you mentioned, the log only showed 2 datanodes are up. Does it mean that one of the datanodes was unreachable when we try to append into the files? Best regards, Arinto www.otnira.com On Thu, Oct 10, 2013 at 4:57 PM, Uma Maheswara Rao G wrote: > Hi Arinto,**** > > ** ** > > Please disable this feature with smaller clusters. > dfs.client.block.write.replace-datanode-on-failure.policy**** > > Reason for this exception is, you have replication set to 3 and looks lik= e > you have only 2 nodes in the cluster from the logs. When you first time > created pipeline we will not do any verification i.e, whether pipeline DN= s > met the replication or not. Above property says only replace DN on failur= e. > But here additionally we took advantage of verifying this condition when = we > reopen the pipeline for append. So, here unfortunately it will not meet t= he > replication with existing DNs and it will try to add another node. Since > you are not having any extra nodes in cluster other than selected nodes, = it > will fail. With the current configurations you can not append. **** > > ** ** > > ** ** > > Also please take a look at default configuration description:**** > > dfs.client.block.write.replace-datanode-on-failure.enable***= * > > true**** > > **** > > If there is a datanode/network failure in the write pipeline,**** > > DFSClient will try to remove the failed datanode from the pipeline***= * > > and then continue writing with the remaining datanodes. As a result,*= * > ** > > the number of datanodes in the pipeline is decreased. The feature is= * > *** > > to add new datanodes to the pipeline.**** > > ** ** > > This is a site-wide property to enable/disable the feature.**** > > ** ** > > When the cluster size is extremely small, e.g. 3 nodes or less, clust= er > **** > > administrators may want to set the policy to NEVER in the default**** > > configuration file or disable this feature. Otherwise, users may**** > > experience an unusually high rate of pipeline failures since it is***= * > > impossible to find new datanodes for replacement.**** > > ** ** > > See also dfs.client.block.write.replace-datanode-on-failure.policy***= * > > **** > > ** ** > > ** ** > > Make this configuration false at your client side.**** > > ** ** > > Regards,**** > > Uma ** ** > > ** ** > > ** ** > > *From:* Arinto Murdopo [mailto:arinto@gmail.com] > *Sent:* 10 October 2013 13:02 > *To:* user@hadoop.apache.org > *Subject:* Intermittent DataStreamer Exception while appending to file > inside HDFS**** > > ** ** > > Hi there, **** > > I have this following exception while I'm appending existing file in my > HDFS. This error appears intermittently. If the error does not show up, I > can append the file successfully. If the error appears, I could not appen= d > the file.**** > > Here is the error: https://gist.github.com/arinto/d37a56f449c61c9d1d9c***= * > > For your convenience, here it is:**** > > 13/10/10 14:17:30 WARN hdfs.DFSClient: DataStreamer Exception**** > > java.io.IOException: Failed to add a datanode. User may turn off this fe= ature by setting dfs.client.block.write.replace-datanode-on-failure.policy = in configuration, where the current policy is DEFAULT. (Nodes: current=3D[= 10.0.106.82:50010, 10.0.106.81:50010], original=3D[10.0.106.82:50010, 10.0.= 106.81:50010])**** > > at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.fin= dNewDatanode(DFSOutputStream.java:778)**** > > at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.add= Datanode2ExistingPipeline(DFSOutputStream.java:838)**** > > at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.set= upPipelineForAppendOrRecovery(DFSOutputStream.java:934)**** > > at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run= (DFSOutputStream.java:461)**** > > Some configuration files:**** > > 1. hdfs-site.xml: https://gist.github.com/arinto/f5f1522a6f6994ddfc17#fil= e-hdfs-append-datastream-exception-hdfs-site-xml > > **** > > ** ** > > 2. core-site.xml: https://gist.github.com/arinto/0c6f40872181fe26f8b1#fil= e-hdfs-append-datastream-exception-core-site-xml > > **** > > ** ** > > So, any idea how to solve this issue? **** > > Some links that I've found (but unfortunately they do not help)**** > > 1. StackOverflow, > our replication factor is 3 and we've never changed the replication facto= r > since we setup the cluster. **** > > 2. Impala-User mailing list: > the error here is due to replication factor set to 1. In our case, we're > using replication factor =3D 3**** > > ** ** > > Best regards, **** > > ** ** > > Arinto**** > > www.otnira.com**** > --001a1135eb6c2c7ff904e86e91eb Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Thank you for the comprehensive answer,
When I inspect our NameNode UI, I see there are 3 = datanodes are up.
However, as you mentioned, the log only showed 2 data= nodes are up. Does it mean that one of the datanodes was unreachable when w= e try to append into the files?

Best regards,



On Thu, Oct 10, 2013 at 4:57 PM, Uma Mah= eswara Rao G <maheswara@huawei.com> wrote:

Hi Arinto,<= /span>

=A0<= /p>

Please disable this featu= re with smaller clusters. dfs.client.block.write.replace-datanode-on-failure.policy=

Reason for this exception is, you have replication = set to 3 and looks like you have only 2 nodes in the cluster from the logs.= When you first time created pipeline we will not do any verification i.e, whether pipeline DNs met the replication or not. Above p= roperty says only replace DN on failure. But here additionally we took adva= ntage of verifying this condition when we reopen the pipeline for append. S= o, here unfortunately it will not meet the replication with existing DNs and it will try to add another node= . Since you are not having any extra nodes in cluster other than selected n= odes, it will fail. With the current configurations you can not append.

=A0

=A0

Also please take a look at default configuration de= scription:

<name>dfs.client.block= .write.replace-datanode-on-failure.enable</name>=

=A0 <value>true</va= lue>

=A0 <description>

=A0=A0=A0 If there is a data= node/network failure in the write pipeline,

=A0=A0=A0 DFSClient will try= to remove the failed datanode from the pipeline

=A0=A0=A0 and then continue = writing with the remaining datanodes. As a result,

=A0=A0=A0 the number of data= nodes in the pipeline is decreased.=A0 The feature is<= /p>

=A0=A0=A0 to add new datanod= es to the pipeline.

=A0

=A0=A0=A0 This is a site-wid= e property to enable/disable the feature.

=A0

=A0=A0=A0 When the cluster s= ize is extremely small, e.g. 3 nodes or less, cluster<= /p>

=A0=A0=A0 administrators may= want to set the policy to NEVER in the default

=A0=A0=A0 configuration file= or disable this feature.=A0 Otherwise, users may

=A0=A0=A0 experience an unus= ually high rate of pipeline failures since it is

=A0=A0=A0 impossible to find= new datanodes for replacement.

=A0

=A0=A0=A0 See also dfs.clien= t.block.write.replace-datanode-on-failure.policy

=A0 </description>

=A0

=A0

Make this configuration false at your client side.<= span style=3D"font-family:"Trebuchet MS","sans-serif"">=

=A0

Regards,

Uma

=A0<= /p>

=A0<= /p>

From: Arinto M= urdopo [mailto:arinto= @gmail.com]
Sent: 10 October 2013 13:02
To: user= @hadoop.apache.org
Subject: Intermittent DataStreamer Exception while appending to file= inside HDFS

=A0

Hi there, <= /u>

I have this following= exception while I'm appending existing file in my HDFS. This error app= ears intermittently. If the error does not show up, I can append the file s= uccessfully. If the error appears, I could not append the file.

For your convenience, here it is:

13/10/10 14:17:30 WARN hdfs.DFSClient: DataStreamer Exception<=
u>
java.io.IOException: Failed to add a datanode.=A0 User may turn off t=
his feature by setting dfs.client.block.write.replace-datanode-on-failure.p=
olicy in configuration, where the current policy is DEFAULT.=A0 (Nodes: cur=
rent=3D[10.0.106.82:=
50010, 10.0.106.=
81:50010], original=3D[10.0.106.82:50010, 10.0.106.81:50010])
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hdfs.=
DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hdfs.=
DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.j=
ava:838)
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.hdfs.=
DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStre=
am.java:934)
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutput=
Stream.java:461)
Some configuration files:
1. hdfs-site.xml: https://gist.github.com/arinto/f5f1522a6f6994ddfc17#file-hdfs-appen=
d-datastream-exception-hdfs-site-xml

=A0
2. core-site.xml: https://gist.github.com/arinto/0c6f40872181fe26f8b1#file-hdfs-appen=
d-datastream-exception-core-site-xml

=A0

So, any idea how to s= olve this issue?

Some links that I've found (but unfortunately th= ey do not help)

1. StackOverflow, our replication factor is 3 and we've never changed = the replication factor since we setup the cluster.

2. Impala-User mailing list: the error here is due to replication factor s= et to 1. In our case, we're using replication factor =3D 3

=A0

Best regards,

=A0


--001a1135eb6c2c7ff904e86e91eb--