Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A35E51746E for ; Mon, 20 Oct 2014 02:56:37 +0000 (UTC) Received: (qmail 4097 invoked by uid 500); 20 Oct 2014 02:56:33 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 3959 invoked by uid 500); 20 Oct 2014 02:56:33 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 3948 invoked by uid 99); 20 Oct 2014 02:56:32 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Oct 2014 02:56:32 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of samliuhadoop@gmail.com designates 209.85.216.176 as permitted sender) Received: from [209.85.216.176] (HELO mail-qc0-f176.google.com) (209.85.216.176) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Oct 2014 02:56:06 +0000 Received: by mail-qc0-f176.google.com with SMTP id r5so3026184qcx.21 for ; Sun, 19 Oct 2014 19:56:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=Po+aPTpBrzzSBZin7kgAQZRxG+Jxysb+QlcxneB1LjI=; b=qc3TaqxMgyErObk5kbfVDSgvXM1ohJOuDFliniKiAXgsK0AJJ2XRxWDj14xDLUNXTa dgIvincoSzZcD/fC4s6LuwLZM9TfhjsPsV9isW54e/3gbW+5cDNATmtcQuCi+paMd/Iy yC+hhLPwDxRPBCj1cA7U2JyZymE/rbP8jqh2VqbhH6CiDHDlw0K12Tfsfva084AKg4F+ MXNmvuP2S/lhxwCJteAPc6mVgKU44L2UxjUlv3gkVHfjAiGH/FZC38DX+7URPrlY2FvZ ASW3AjpauRUoawvMXTGLrx2JY1sf6Oe3Exv7svoSj5Uuc10mH5c0Q211c+7bqhX6RnUX Q+pQ== MIME-Version: 1.0 X-Received: by 10.224.5.130 with SMTP id 2mr32173128qav.77.1413773764815; Sun, 19 Oct 2014 19:56:04 -0700 (PDT) Received: by 10.96.40.34 with HTTP; Sun, 19 Oct 2014 19:56:04 -0700 (PDT) Date: Sun, 19 Oct 2014 19:56:04 -0700 Message-ID: Subject: Can add a regular check in DataNode on free disk space? From: sam liu To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=047d7b66fbab0b70200505d1db5d X-Virus-Checked: Checked by ClamAV on apache.org --047d7b66fbab0b70200505d1db5d Content-Type: text/plain; charset=UTF-8 Hi Experts and Developers, At present, if a DataNode does not has free disk space, we can not get this bad situation from anywhere, including DataNode log. At the same time, under this situation, the hdfs writing operation will fail and return error msg as below. However, from the error msg, user could not know the root cause is that the only datanode runs out of disk space, and he also could not get any useful hint in datanode log. So I believe it will be better if we could add a regular check in DataNode on free disk space, and it will add WARNING or ERROR msg in datanode log if that datanode runs out of space. What's your opinion? Error Msg: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) Thanks! --047d7b66fbab0b70200505d1db5d Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Experts and Developers,

At prese= nt, if a DataNode does not has free disk space, we can not get this bad sit= uation from anywhere, including DataNode log. At the same time, under this = situation, the hdfs writing operation will fail and return error msg as bel= ow. However, from the error msg, user could not know the root cause is that= the only datanode runs out of disk space, and he also could not get any us= eful hint in datanode log. So I believe it will be better if we could add a= regular check in DataNode on free disk space, and it will add WARNING or E= RROR msg in datanode log if that datanode runs out of space. What's you= r opinion?

Error Msg:
org.apache.hadoop.ipc.RemoteException(java.= io.IOException): File /user/hadoop/PiEstimator_TMP_3_141592654/in/part0 cou= ld only be replicated to 0 nodes instead of minReplication (=3D1).=C2=A0 Th= ere are 1 datanode(s) running and no node(s) are excluded in this operation= .
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hdfs.s= erver.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hdfs.server= .namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hdfs.server.namen= ode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hdfs.protocolPB.ClientN= amenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServer= SideTranslatorPB.java:440)


Thanks!
--047d7b66fbab0b70200505d1db5d--