Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 79753 invoked from network); 31 Dec 2007 22:20:06 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 31 Dec 2007 22:20:06 -0000 Received: (qmail 4759 invoked by uid 500); 31 Dec 2007 22:19:54 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 4589 invoked by uid 500); 31 Dec 2007 22:19:54 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Delivered-To: moderator for hadoop-dev@lucene.apache.org Received: (qmail 94123 invoked by uid 99); 31 Dec 2007 21:57:13 -0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ibolotin@gmail.com designates 209.85.146.178 as permitted sender) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:from:to:references:in-reply-to:subject:date:message-id:mime-version:content-type:content-transfer-encoding:x-mailer:thread-index:content-language; bh=gn1SUtw6A1/4iDTk9iGxWkMsfB9czSu1jRczCEledBM=; b=kmb5SbY1VEg2LqubkCmKcM16lQFoU5pFpZMY3qXMEXhrZHlqPrOCtOwEi5wdgMEJXdj4b6wgzlGSb/Kau1zTYLDwoBrrMQ7W9cajdXxKDVawB841XZ7D4vyFarZd3m3sA/025sB8xrbi17ynzE81qJRy7Zr+UfzAVdn+WzfiSlQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:references:in-reply-to:subject:date:message-id:mime-version:content-type:content-transfer-encoding:x-mailer:thread-index:content-language; b=vUR5s2XYxF+g1Sm0tVWHdgTv7HCZIZrF8th6DlKGJC+WF5xjelvOiN/qCyhYITsBLCjIp8RtvBQanM3AW8hUjiERuStasy3TKwby4tjcclZwQfcpxhaveu1wM9C3k+77m4XpkQfC12PIafxRxcQ6fzXWDKWIr2qcbHfDDvBSgXs= From: "Igor Bolotin" To: References: In-Reply-To: Subject: RE: Question about HDFS allocations Date: Mon, 31 Dec 2007 13:56:40 -0800 Message-ID: <002f01c84bf8$0749f880$15dde980$@com> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: AchL5LF6ps1B+/i1RxqZXwpleiwsvQADybt4AADj/dA= Content-Language: en-us X-Virus-Checked: Checked by ClamAV on apache.org There is a configuration property that allows you to reserve some disk space on datanode servers: dfs.datanode.du.reserved Reserved space in bytes. Always leave this much space free for non dfs use 1000000000 -Igor -----Original Message----- From: Michael Bieniosek [mailto:michael@powerset.com] Sent: Monday, December 31, 2007 1:27 PM To: hadoop-dev@lucene.apache.org; Bryan Duxbury Subject: Re: Question about HDFS allocations AFAIK, hdfs doesn't have any notion of balancing data, nor can it do much to avoid running disks full. What you describe would certainly be a useful feature. There is a crude way to force the DFS to rebalance: if a machine gets too full, you can remove it from the dfs cluster. The namenode will then redistribute all the blocks that were on that machine. Then, you can wipe your datanode's dfs data and bring it up afresh. -Michael On 12/31/07 11:31 AM, "Bryan Duxbury" wrote: We've been doing some testing with HBase, and one of the problems we ran into was that our machines are not homogenous in terms of disk capacity. A few of our machines only have 80gb drives, where the rest have 250s. As such, as the equal distribution of blocks went on, these smaller machines filled up first, completely overloading the drives, and came to a crashing halt. Since one of these machines was also the namenode, it broke the rest of the cluster. What I'm wondering is if there should be a way to tell HDFS to only use something like 80% of available disk space before considering a machine full. Would this be a useful feature, or should we approach the problem from another angle, like using a separate HDFS data partition? -Bryan