Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D24A610EB4 for ; Mon, 10 Jun 2013 09:36:49 +0000 (UTC) Received: (qmail 11672 invoked by uid 500); 10 Jun 2013 09:36:44 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 11224 invoked by uid 500); 10 Jun 2013 09:36:37 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 11217 invoked by uid 99); 10 Jun 2013 09:36:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Jun 2013 09:36:36 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of mail2mayank@gmail.com designates 74.125.82.182 as permitted sender) Received: from [74.125.82.182] (HELO mail-we0-f182.google.com) (74.125.82.182) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Jun 2013 09:36:30 +0000 Received: by mail-we0-f182.google.com with SMTP id p60so2996305wes.27 for ; Mon, 10 Jun 2013 02:36:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=h7GZj5y7IuJiyPSK5hOzkDgU5zPdFrqMatjanAnZA00=; b=gC8jj3HUJVIlgKmW48wHtA27Qe48L2N02tUk2QUiCrmzROQoaa751TwKkGYu6Bz/nJ ztsy2tB3mIMJBrX/RjBTSbOBY82c0qOjJVdyQtz3FVo/YW/GwLJixdLeoowDuU+R2TXt gAdgWHIaL8vXjFCeS0OGDO+mbMh78oelAddhD5kqN8w77xEe6+3D3X3pzK+WgahCprJL Jgf1YF+KlWKTKlUT6Tdeg+3YqbvaqUOhQdIKeLmOqu/XCNEDKgo3uRjZ92R8NNKBTjA1 O1EzHXN38ohtrVSUsxpnxq64Pg+XRgbgC8Q8NTmM2gXXM5HIkY0cNotcnJxcHqEnsboH uauA== MIME-Version: 1.0 X-Received: by 10.194.171.130 with SMTP id au2mr4896443wjc.90.1370856969557; Mon, 10 Jun 2013 02:36:09 -0700 (PDT) Received: by 10.194.173.162 with HTTP; Mon, 10 Jun 2013 02:36:09 -0700 (PDT) Date: Mon, 10 Jun 2013 15:06:09 +0530 Message-ID: Subject: Application errors with one disk on datanode getting filled up to 100% From: Mayank To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e011773efb55f8704dec982da X-Virus-Checked: Checked by ClamAV on apache.org --089e011773efb55f8704dec982da Content-Type: text/plain; charset=UTF-8 We are running a hadoop cluster with 10 datanodes and a namenode. Each datanode is setup with 4 disks (/data1, /data2, /data3, /data4), which each disk having a capacity 414GB. hdfs-site.xml has following property set: dfs.data.dir /data1/hadoopfs,/data2/hadoopfs,/data3/hadoopfs,/data4/hadoopfs Data dirs for DFS. Now we are facing a issue where in we find /data1 getting filled up quickly and many a times we see it's usage running at 100% with just few megabytes of free space. This issue is visible on 7 out of 10 datanodes at present. We've some java applications which are writing to hdfs and many a times we are seeing foloowing errors in our application logs: java.io.IOException: All datanodes xxx.xxx.xxx.xxx:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3093) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2200(DFSClient.java:2586) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2790) I went through some old discussions and looks like manual rebalancing is what is required in this case and we should also have dfs.datanode.du.reserved set up. However I'd like to understand if this issue, with one disk getting filled up to 100% can result into the issue which we are seeing in our application. Also, are there any other peformance implications due to some of the disks running at 100% usage on a datanode. -- Mayank Joshi Skype: mail2mayank Mb.: +91 8690625808 Blog: http://www.techynfreesouls.co.nr PhotoStream: http://picasaweb.google.com/mail2mayank Today is tommorrow I was so worried about yesterday ... --089e011773efb55f8704dec982da Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
We are running a hadoop cluster with 10 datanodes and= a namenode. Each datanode is setup with 4 disks (/data1, /data2, /data3, /= data4), which each disk having a capacity 414GB.


hdfs= -site.xml has following property set:

<property>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 <name= >dfs.data.dir</name>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= <value>/data1/hadoopfs,/data2/hadoopfs,/data3/hadoopfs,/data4/hadoop= fs</value>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 <descript= ion>Data dirs for DFS.</description>
</property>

Now we are facing a iss= ue where in we find /data1 getting filled up quickly and many a times we se= e it's usage running at 100% with just few megabytes of free space. Thi= s issue is visible on 7 out of 10 datanodes at present.

We've some java applications which are writing to hdfs a= nd many a times we are seeing foloowing errors in our application logs:
=
java.io.IOException: All datanodes xxx.xxx.xxx.xxx:50010 are bad. Aborting...
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(D=
FSClient.java:3093)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2200(DFSClient.=
java:2586)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSCl=
ient.java:2790)

I went through some old discussions and looks like manual rebalan= cing is what is required in this case and we should also have dfs.datanode.= du.reserved set up.

However I'd like to understand if= this issue, with one disk getting filled up to 100% can result into the is= sue which we are seeing in our application.

Also, are there any other peformance implications due to some of the di= sks running at 100% usage on a datanode.
--
Mayank Joshi<= br>
Skype: mail2mayank
Mb.:=C2=A0 +91 8690625808

Blog: http://www.techyn= freesouls.co.nr
PhotoStream: http://picasaweb.google.com/mail2mayank

Today is tommorr= ow I was so worried about yesterday ...
--089e011773efb55f8704dec982da--