Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9E6881727D for ; Wed, 8 Oct 2014 11:14:35 +0000 (UTC) Received: (qmail 94892 invoked by uid 500); 8 Oct 2014 11:14:30 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 94771 invoked by uid 500); 8 Oct 2014 11:14:30 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 94761 invoked by uid 99); 8 Oct 2014 11:14:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Oct 2014 11:14:29 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of acedres@pivotal.io designates 209.85.213.170 as permitted sender) Received: from [209.85.213.170] (HELO mail-ig0-f170.google.com) (209.85.213.170) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Oct 2014 11:14:25 +0000 Received: by mail-ig0-f170.google.com with SMTP id hn18so1275837igb.1 for ; Wed, 08 Oct 2014 04:14:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=fok94dyuwCqwAVjrp1DOvSSHk8lVcfweUpAm3gMBcKo=; b=ROlrC/3aLo3K9UEEUkJ9k975ALBQPSiCKtnn1k1wICHePZ3D59f2IGE/xIbhEpyaOp 6crgwY+SnviZwPcXsz27kFRi0yMU3FzSoXsrFVj+ZLZaNHUCcBgYm/E+PtkVt1nwUKfQ b0muZQ464yCrtk49GPa1X9oaXRNT3BE2Zxdjh/hEux5CQdb2k80yd+XRcbdq4US3vTr+ 8imAn7qi2thRSQof9WI2wrmFLWeCJ+8FR/Zoy40vDgXGQv7vac3FysmuO4OjHPpyd2u+ gx7nut83UVkTeJtqWkBCr2HcM/x+MB/gpgWA8OpxB3ITmNvxMlYHmbJvZgYe5xTCo2XC lgyg== X-Gm-Message-State: ALoCoQkXl5YDZWK1rgPCZO4zJqh9vR83phucfEtRLbRnZC7rVsJthVXMsaMi4tWoINYEmG3cBeHy MIME-Version: 1.0 X-Received: by 10.42.92.129 with SMTP id t1mr12196626icm.59.1412766843913; Wed, 08 Oct 2014 04:14:03 -0700 (PDT) Received: by 10.42.224.71 with HTTP; Wed, 8 Oct 2014 04:14:03 -0700 (PDT) In-Reply-To: <543515EA.10705@etinternational.com> References: <54329D2A.9020708@etinternational.com> <543515EA.10705@etinternational.com> Date: Wed, 8 Oct 2014 12:14:03 +0100 Message-ID: Subject: Re: Datanode volume full, but not moving to free volume From: Aitor Cedres To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=90e6ba6147fee1def90504e76983 X-Virus-Checked: Checked by ClamAV on apache.org --90e6ba6147fee1def90504e76983 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Brian, I would try to move the Block Pools directories (BP-1408773897-172.17.1.1-1400769841207). You must shutdown your DataNode process before doing this operation. Regards, Aitor Cedr=C3=A9s On 8 October 2014 11:46, Brian C. Huffman wrote: > Can I move a whole subdir? Or does it have to be individual block files > / metadata? > > For example, I see this: > [hadoop@thor1 finalized]$ pwd > > /data/data2/hadoop/yarn_data/hdfs/datanode/current/BP-1408773897-172.17.1= .1-1400769841207/current/finalized > [hadoop@thor1 finalized]$ du -sh subdir10/ > 80G subdir10/ > > So could I move subdir10 to the same location under /data/data3? > > Thanks, > Brian > > > Brian C. Huffman System Administrator ET International, Inc.On 10/8/14, > 4:44 AM, Aitor Cedres wrote: > > > Hi Brian, > > Hadoop does not balance the disks within a DataNode. If you ran out of > space and then add additional disks, you should shutdown the DataNode and > move manually a few files to the new disk. > > Regards, > > Aitor Cedr=C3=A9s > > On 6 October 2014 14:46, Brian C. Huffman > wrote: > >> All, >> >> I have a small hadoop cluster (2.5.0) with 4 datanodes and 3 data disks >> per node. Lately some of the volumes have been filling, but instead of >> moving to other configured volumes that *have* free space, it's giving >> errors in the datanode logs: >> 2014-10-03 11:52:44,989 ERROR >> org.apache.hadoop.hdfs.server.datanode.DataNode: >> thor2.xmen.eti:50010:DataXceiver error processing WRITE_BLOCK >> operation src: /172.17.1.3:35412 dst: /172.17.1.2:50010 >> java.io.IOException: No space left on device >> at java.io.FileOutputStream.writeBytes(Native Method) >> at java.io.FileOutputStream.write(FileOutputStream.java:345) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(Block= Receiver.java:592) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockR= eceiver.java:734) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceive= r.java:741) >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Recei= ver.java:124) >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver= .java:71) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:= 234) >> at java.lang.Thread.run(Thread.java:745) >> >> Unfortunately it's continuing to try to write and when it fails, it's >> passing the exception to the client. >> >> I did a restart and then it seemed to figure out that it should move to >> the next volume. >> >> Any suggestions to keep this from happening in the future? >> >> Also - could it be an issue that I have a small amount of non-HDFS data >> on those volumes? >> >> Thanks, >> Brian >> >> > > --90e6ba6147fee1def90504e76983 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

Hi Brian,

I would try to= move the Block Pools directories (BP-1408773897-172.17.1.1-1400769841207).= You must shutdown your DataNode process before doing this operation.
=

Regards,

Aitor Cedr=C3=A9s
On 8 October 2014 11:4= 6, Brian C. Huffman <bhuffman@etinternational.com> wrote:
=20 =20 =20
Can I move a whole subdir?=C2=A0 Or does it have to be individual block files / metadata?

For example, I see this:
[hadoop@thor1 finalized]$ pwd
/data/data2/hadoop/yarn_data/hdfs/datanode/current/BP-1408773897-172.17.1.1= -1400769841207/current/finalized
[hadoop@thor1 finalized]$ du -sh subdir10/
80G=C2=A0=C2=A0=C2=A0 subdir10/

So could I move subdir10 to the same location under /data/data3?

Thanks,
Brian


Brian C. Huffman System Administrator ET International, Inc.On 10/8/14, 4:44 AM, Aitor Cedres wrote:

Hi Brian,

Hadoop does not balance the disks within a DataNode. If you ran out of space and then add additional disks, you should shutdown the DataNode and move manually a few files to the new disk.=C2=A0

Regards,

Aitor Cedr=C3=A9s

On 6 October 2014 14:46, Brian C. Huffman <bhuffman@etinternational.com> wrote:
All,

I have a small hadoop cluster (2.5.0) with 4 datanodes and 3 data disks per node.=C2=A0 Lately some of the volumes have been filling, but instead of moving to other configured volumes that *have* free space, it's giving errors in the datanode logs:
2014-10-03 11:52:44,989 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: thor2.xmen.eti:50010:DataXceiver error processing WRITE_BLOCK
=C2=A0operation=C2=A0 src: /172.17.1.3:35412 dst: /1= 72.17.1.2:50010
java.io.IOException: No space left on device
=C2=A0 =C2=A0 at java.io.FileOutputStream.writeBytes(Native M= ethod)
=C2=A0 =C2=A0 at java.io.FileOutputStream.write(FileOutputStr= eam.java:345)
=C2=A0 =C2=A0 at org.apache.hadoop.hdfs.server.datanode.Block= Receiver.receivePacket(BlockReceiver.java:592)
=C2=A0 =C2=A0 at org.apache.hadoop.hdfs.server.datanode.Block= Receiver.receiveBlock(BlockReceiver.java:734)
=C2=A0 =C2=A0 at org.apache.hadoop.hdfs.server.datanode.DataX= ceiver.writeBlock(DataXceiver.java:741)
=C2=A0 =C2=A0 at org.apache.hadoop.hdfs.protocol.datatransfer= .Receiver.opWriteBlock(Receiver.java:124)
=C2=A0 =C2=A0 at org.apache.hadoop.hdfs.protocol.datatransfer= .Receiver.processOp(Receiver.java:71)
=C2=A0 =C2=A0 at org.apache.hadoop.hdfs.server.datanode.DataX= ceiver.run(DataXceiver.java:234)
=C2=A0 =C2=A0 at java.lang.Thread.run(Thread.java:745)

Unfortunately it's continuing to try to write and when it fails, it's passing the exception to the client.

I did a restart and then it seemed to figure out that it should move to the next volume.

Any suggestions to keep this from happening in the future?
Also - could it be an issue that I have a small amount of non-HDFS data on those volumes?

Thanks,
Brian




--90e6ba6147fee1def90504e76983--