Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 425E8177A3 for ; Mon, 20 Oct 2014 06:32:43 +0000 (UTC) Received: (qmail 49884 invoked by uid 500); 20 Oct 2014 06:32:37 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 49706 invoked by uid 500); 20 Oct 2014 06:32:37 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 49695 invoked by uid 99); 20 Oct 2014 06:32:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Oct 2014 06:32:36 +0000 X-ASF-Spam-Status: No, hits=2.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of nitinpawar432@gmail.com designates 209.85.213.44 as permitted sender) Received: from [209.85.213.44] (HELO mail-yh0-f44.google.com) (209.85.213.44) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Oct 2014 06:32:32 +0000 Received: by mail-yh0-f44.google.com with SMTP id i57so2587062yha.31 for ; Sun, 19 Oct 2014 23:32:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=0AOnlVjYIFa+Yg+yl9O/w4tNaKvHGMNcCX6/xiGWUiY=; b=nnp1/VzBXi5uNPySH6pbEJ8p1UbDzcC9oJMCxkHAuZlUdn3Rk38oLedQAa5+fJwgU8 H+eKk78AJhGJbLq/anfUO4aytxWnqkX6A6NchBzYALvv+SybnDdsKQT7uiEpN2PPcx79 QZ7JYdKkT/R6RZGIc5LPgbo0YBB4TS/pa+P6hu0DI1XSXlakRnHtCdGJXhDFxfFL3p3/ 5tcw+2AOsINb9diqzgHY7WeZsIWYSUmmMhc5SW4jjDF9QOCJ+oHdDOds1b/XTmO2btlo YA/UEbnQtujC4v++EVA6T2fiPTA6PTDaLrAO1LCYnT9oA3DE1ti4ZXqpBvCTaRfbIg5l 0AVw== MIME-Version: 1.0 X-Received: by 10.236.15.68 with SMTP id e44mr37122855yhe.37.1413786730867; Sun, 19 Oct 2014 23:32:10 -0700 (PDT) Received: by 10.170.61.145 with HTTP; Sun, 19 Oct 2014 23:32:10 -0700 (PDT) In-Reply-To: References: Date: Mon, 20 Oct 2014 12:02:10 +0530 Message-ID: Subject: Re: Can add a regular check in DataNode on free disk space? From: Nitin Pawar To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e0122a59ee1b4a10505d4dfcc X-Virus-Checked: Checked by ClamAV on apache.org --089e0122a59ee1b4a10505d4dfcc Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Sam, Monitoring disks and other server related activities can be easily handled by Nagios On Mon, Oct 20, 2014 at 11:58 AM, Dhiraj Kamble wrote: > Formatting NameNode will cause data loss =E2=80=93 in effect you will lo= se all > your data on DataNodes(rather access to data on DataNodes). NameNode will > have no idea where your data(files) are stored. I don=E2=80=99t think tha= t=E2=80=99s what > you=E2=80=99re looking for. > > I am wondering why isn=E2=80=99t there any log information on DataNode fo= r disk > full. What version of Hadoop are you using and what=E2=80=99s your config= uration( > Single Node, Single Node Pseudo Distributed or Cluster) > > > > Regards, > > Dhiraj > > > > *From:* sam liu [mailto:samliuhadoop@gmail.com] > *Sent:* Monday, October 20, 2014 11:51 AM > *To:* user@hadoop.apache.org > *Subject:* Re: Can add a regular check in DataNode on free disk space? > > > > Hi unmesha, > > Thanks for your response, but I am not clear what effect will the hadoop > cluster has after applying above operations. Could you pls give more > explanations? > > > > 2014-10-19 21:37 GMT-07:00 unmesha sreeveni : > > 1. Stop all Hadoop daemons > > 2. Remove all files from > > /var/lib/hadoop-hdfs/cache/hdfs/dfs/name > > 3. Format namenode > > 4. Start all Hadoop daemons. > > > > On Mon, Oct 20, 2014 at 8:26 AM, sam liu wrote: > > Hi Experts and Developers, > > At present, if a DataNode does not has free disk space, we can not get > this bad situation from anywhere, including DataNode log. At the same tim= e, > under this situation, the hdfs writing operation will fail and return err= or > msg as below. However, from the error msg, user could not know the root > cause is that the only datanode runs out of disk space, and he also could > not get any useful hint in datanode log. So I believe it will be better i= f > we could add a regular check in DataNode on free disk space, and it will > add WARNING or ERROR msg in datanode log if that datanode runs out of > space. What's your opinion? > > Error Msg: > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicate= d > to 0 nodes instead of minReplication (=3D1). There are 1 datanode(s) run= ning > and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(B= lockManager.java:1441) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FS= Namesystem.java:2702) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNod= eRpcServer.java:584) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTransla= torPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > > Thanks! > > > > > > -- > > *Thanks & Regards * > > > > *Unmesha Sreeveni U.B* > > *Hadoop, Bigdata Developer* > > *Center for Cyber Security | Amrita Vishwa Vidyapeetham* > > http://www.unmeshasreeveni.blogspot.in/ > > > > > > > > ------------------------------ > > PLEASE NOTE: The information contained in this electronic mail message is > intended only for the use of the designated recipient(s) named above. If > the reader of this message is not the intended recipient, you are hereby > notified that you have received this message in error and that any review= , > dissemination, distribution, or copying of this message is strictly > prohibited. If you have received this communication in error, please noti= fy > the sender by telephone or e-mail (as shown above) immediately and destro= y > any and all copies of this message in your possession (whether hard copie= s > or electronically stored copies). > > --=20 Nitin Pawar --089e0122a59ee1b4a10505d4dfcc Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Sam,

Monitoring disks and other serv= er related activities can be easily handled by Nagios=C2=A0

On Mon, Oct 20, 2014 = at 11:58 AM, Dhiraj Kamble <Dhiraj.Kamble@sandisk.com> wrote:

Formatting NameNode will = cause data loss =E2=80=93 in effect you will lose all your data on DataNode= s(rather access to data on DataNodes). NameNode will have no idea where your data(files) are stored. I don=E2=80=99t think that=E2=80=99s wh= at you=E2=80=99re looking for.

I am wondering why isn=E2= =80=99t there any log information on DataNode for disk full. What version o= f Hadoop are you using and what=E2=80=99s your configuration( Single Node, Single Node Pseudo Distributed or Cluster)

=C2=A0

Regards,

Dhiraj

=C2=A0

From: sam liu = [mailto:samliuh= adoop@gmail.com]
Sent: Monday, October 20, 2014 11:51 AM
To: user= @hadoop.apache.org
Subject: Re: Can add a regular check in DataNode on free disk space?=

=C2=A0

Hi unmesha,=

Thanks for your response, but I am not clear what ef= fect will the hadoop cluster has after applying above operations. Could you= pls give more explanations?

=C2=A0

2014-10-19 21:37 GMT-07:00 unmesha sreeveni <unmeshabiju@gmail.co= m>:

1. Stop all Hadoop daemons

2. Remove all files from

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /var/lib/hadoop-hdfs/cache= /hdfs/dfs/name

3. Format namenode

4. Start all Hadoop daemons.

=C2=A0

On Mon, Oct 20, 2014 at 8:26 AM, sam liu <samliuhadoop@gmail.com= > wrote:

Hi Experts and Develo= pers,

At present, if a Data= Node does not has free disk space, we can not get this bad situation from a= nywhere, including DataNode log. At the same time, under this situation, th= e hdfs writing operation will fail and return error msg as below. However, from the error msg, user could not kno= w the root cause is that the only datanode runs out of disk space, and he a= lso could not get any useful hint in datanode log. So I believe it will be = better if we could add a regular check in DataNode on free disk space, and it will add WARNING or ERROR msg= in datanode log if that datanode runs out of space. What's your opinio= n?

Error Msg:
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/hado= op/PiEstimator_TMP_3_141592654/in/part0 could only be replicated to 0 nodes= instead of minReplication (=3D1).=C2=A0 There are 1 datanode(s) running an= d no node(s) are excluded in this operation.
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hdfs.server= .blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hdfs.server= .namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hdfs.server= .namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hdfs.protoc= olPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodePr= otocolServerSideTranslatorPB.java:440)

Thanks!



=C2=A0

--

Thanks & Reg= ards

=C2=A0

Unmesha Sreeveni U.= B

Hadoop, Bigdata Dev= eloper

Center for Cyber Se= curity | Amrita Vishwa Vidyapeetham

=C2=A0

=C2=A0

=C2=A0




PLEASE NOTE: The information contained in this electronic mail message is i= ntended only for the use of the designated recipient(s) named above. If the= reader of this message is not the intended recipient, you are hereby notif= ied that you have received this message in error and that any review, dissemination, distribution, or copy= ing of this message is strictly prohibited. If you have received this commu= nication in error, please notify the sender by telephone or e-mail (as show= n above) immediately and destroy any and all copies of this message in your possession (whether hard copies= or electronically stored copies).




--
Nitin Pawar<= br> --089e0122a59ee1b4a10505d4dfcc--