Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 89B301733A for ; Wed, 8 Oct 2014 11:39:41 +0000 (UTC) Received: (qmail 49541 invoked by uid 500); 8 Oct 2014 11:39:36 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 49411 invoked by uid 500); 8 Oct 2014 11:39:36 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 49401 invoked by uid 99); 8 Oct 2014 11:39:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Oct 2014 11:39:36 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of bhuffman@etinternational.com designates 65.222.140.81 as permitted sender) Received: from [65.222.140.81] (HELO mail02.etinternational.com) (65.222.140.81) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Oct 2014 11:39:09 +0000 X-Footer: ZXRpbnRlcm5hdGlvbmFsLmNvbQ== Received: from zaphod.local ([71.162.143.10]) (authenticated user bhuffman@etinternational.com) by mail02.etinternational.com (using TLSv1/SSLv3 with cipher AES128-SHA (128 bits)) for user@hadoop.apache.org; Wed, 8 Oct 2014 07:39:05 -0400 Message-ID: <54352257.3050709@etinternational.com> Date: Wed, 08 Oct 2014 07:39:03 -0400 From: "Brian C. Huffman" User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: user@hadoop.apache.org Subject: Re: Datanode volume full, but not moving to free volume References: <54329D2A.9020708@etinternational.com> <543515EA.10705@etinternational.com> In-Reply-To: Content-Type: multipart/alternative; boundary="------------090905060909010103090208" X-Virus-Checked: Checked by ClamAV on apache.org This is a multi-part message in MIME format. --------------090905060909010103090208 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Hmmm.. It seems that there's only one block pool per disk. So that won't help me. :-( Also, I see the blockpool directory names are all the same. Is that expected? So even if I put a larger disk in, I couldn't consolidate the smaller disk's blockpool directories? [hadoop@thor1 current]$ ls /data/data1/hadoop/yarn_data/hdfs/datanode/current BP-1408773897-172.17.1.1-1400769841207 VERSION [hadoop@thor1 current]$ ls /data/data2/hadoop/yarn_data/hdfs/datanode/current BP-1408773897-172.17.1.1-1400769841207 VERSION [hadoop@thor1 current]$ ls /data/data3/hadoop/yarn_data/hdfs/datanode/current BP-1408773897-172.17.1.1-1400769841207 VERSION Regards, Brian On 10/8/14, 7:14 AM, Aitor Cedres wrote: > > Hi Brian, > > I would try to move the Block Pools directories > (BP-1408773897-172.17.1.1-1400769841207). You must shutdown your > DataNode process before doing this operation. > > Regards, > > Aitor Cedrés > > On 8 October 2014 11:46, Brian C. Huffman > > > wrote: > > Can I move a whole subdir? Or does it have to be individual block > files / metadata? > > For example, I see this: > [hadoop@thor1 finalized]$ pwd > /data/data2/hadoop/yarn_data/hdfs/datanode/current/BP-1408773897-172.17.1.1-1400769841207/current/finalized > [hadoop@thor1 finalized]$ du -sh subdir10/ > 80G subdir10/ > > So could I move subdir10 to the same location under /data/data3? > > Thanks, > Brian > > > Brian C. Huffman System Administrator ET International, Inc.On > 10/8/14, 4:44 AM, Aitor Cedres wrote: >> >> Hi Brian, >> >> Hadoop does not balance the disks within a DataNode. If you ran >> out of space and then add additional disks, you should shutdown >> the DataNode and move manually a few files to the new disk. >> >> Regards, >> >> Aitor Cedrés >> >> On 6 October 2014 14:46, Brian C. Huffman >> > > wrote: >> >> All, >> >> I have a small hadoop cluster (2.5.0) with 4 datanodes and 3 >> data disks per node. Lately some of the volumes have been >> filling, but instead of moving to other configured volumes >> that *have* free space, it's giving errors in the datanode logs: >> 2014-10-03 11:52:44,989 ERROR >> org.apache.hadoop.hdfs.server.datanode.DataNode: >> thor2.xmen.eti:50010:DataXceiver error processing WRITE_BLOCK >> operation src: /172.17.1.3:35412 >> dst: /172.17.1.2:50010 >> java.io.IOException: No space left on device >> at java.io.FileOutputStream.writeBytes(Native Method) >> at java.io.FileOutputStream.write(FileOutputStream.java:345) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:592) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:734) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:741) >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124) >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:234) >> at java.lang.Thread.run(Thread.java:745) >> >> Unfortunately it's continuing to try to write and when it >> fails, it's passing the exception to the client. >> >> I did a restart and then it seemed to figure out that it >> should move to the next volume. >> >> Any suggestions to keep this from happening in the future? >> >> Also - could it be an issue that I have a small amount of >> non-HDFS data on those volumes? >> >> Thanks, >> Brian >> >> > > --------------090905060909010103090208 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit
Hmmm..  It seems that there's only one block pool per disk.  So that won't help me.  :-(

Also, I see the blockpool directory names are all the same.  Is that expected?  So even if I put a larger disk in, I couldn't consolidate the smaller disk's blockpool directories?
[hadoop@thor1 current]$ ls /data/data1/hadoop/yarn_data/hdfs/datanode/current
BP-1408773897-172.17.1.1-1400769841207  VERSION
[hadoop@thor1 current]$ ls /data/data2/hadoop/yarn_data/hdfs/datanode/current
BP-1408773897-172.17.1.1-1400769841207  VERSION
[hadoop@thor1 current]$ ls /data/data3/hadoop/yarn_data/hdfs/datanode/current
BP-1408773897-172.17.1.1-1400769841207  VERSION

Regards,
Brian

On 10/8/14, 7:14 AM, Aitor Cedres wrote:

Hi Brian,

I would try to move the Block Pools directories (BP-1408773897-172.17.1.1-1400769841207). You must shutdown your DataNode process before doing this operation.

Regards,

Aitor Cedrés

On 8 October 2014 11:46, Brian C. Huffman <bhuffman@etinternational.com> wrote:
Can I move a whole subdir?  Or does it have to be individual block files / metadata?

For example, I see this:
[hadoop@thor1 finalized]$ pwd
/data/data2/hadoop/yarn_data/hdfs/datanode/current/BP-1408773897-172.17.1.1-1400769841207/current/finalized
[hadoop@thor1 finalized]$ du -sh subdir10/
80G    subdir10/

So could I move subdir10 to the same location under /data/data3?

Thanks,
Brian


Brian C. Huffman System Administrator ET International, Inc.On 10/8/14, 4:44 AM, Aitor Cedres wrote:

Hi Brian,

Hadoop does not balance the disks within a DataNode. If you ran out of space and then add additional disks, you should shutdown the DataNode and move manually a few files to the new disk. 

Regards,

Aitor Cedrés

On 6 October 2014 14:46, Brian C. Huffman <bhuffman@etinternational.com> wrote:
All,

I have a small hadoop cluster (2.5.0) with 4 datanodes and 3 data disks per node.  Lately some of the volumes have been filling, but instead of moving to other configured volumes that *have* free space, it's giving errors in the datanode logs:
2014-10-03 11:52:44,989 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: thor2.xmen.eti:50010:DataXceiver error processing WRITE_BLOCK
 operation  src: /172.17.1.3:35412 dst: /172.17.1.2:50010
java.io.IOException: No space left on device
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:345)
    at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:592)
    at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:734)
    at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:741)
    at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
    at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
    at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:234)
    at java.lang.Thread.run(Thread.java:745)

Unfortunately it's continuing to try to write and when it fails, it's passing the exception to the client.

I did a restart and then it seemed to figure out that it should move to the next volume.

Any suggestions to keep this from happening in the future?

Also - could it be an issue that I have a small amount of non-HDFS data on those volumes?

Thanks,
Brian





--------------090905060909010103090208--