Return-Path: Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: (qmail 2639 invoked from network); 2 Mar 2011 16:37:02 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 2 Mar 2011 16:37:02 -0000 Received: (qmail 3741 invoked by uid 500); 2 Mar 2011 16:37:02 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 3498 invoked by uid 500); 2 Mar 2011 16:36:58 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 3473 invoked by uid 99); 2 Mar 2011 16:36:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Mar 2011 16:36:57 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Mar 2011 16:36:57 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id EF07E4CAE3 for ; Wed, 2 Mar 2011 16:36:36 +0000 (UTC) Date: Wed, 2 Mar 2011 16:36:36 +0000 (UTC) From: "Wang Xu (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <365862954.8115.1299083796975.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] Commented: (HDFS-1312) Re-balance disks within a Datanode MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001477#comment-13001477 ] Wang Xu commented on HDFS-1312: ------------------------------- Hi folks, Here is the basic design of the process. Is there any other consideration? The basic flow is: # Re-balance should be only process while it is not in heavy load (should this be guaranteed by the administrator?) # Calculate the total and average available & used space of dirs. # Find the disks have most and least space, and decide move direction. We need define a unbalance threshold here to decide whether it is worthy to re-balance. # Lock origin disks: stop written to them and wait finalization on them. # Find the deepest dirs in every selected disk and move blocks from those dirs. And if a dir is empty, then the dir should also be removed. # Check the balance status while the blocks are migrated, and break from the loop if it reaches a threshold. # Release the lock. The case should be take into account: * If a disk have much less space than other disks, it might have least available space, but could not migrate blocks out. * If two or more dirs are located in a same disk, they might confuse the space calculation. And this is just the case in MiniDFSCluster deployment. > Re-balance disks within a Datanode > ---------------------------------- > > Key: HDFS-1312 > URL: https://issues.apache.org/jira/browse/HDFS-1312 > Project: Hadoop HDFS > Issue Type: New Feature > Components: data-node > Reporter: Travis Crawford > > Filing this issue in response to ``full disk woes`` on hdfs-user. > Datanodes fill their storage directories unevenly, leading to situations where certain disks are full while others are significantly less used. Users at many different sites have experienced this issue, and HDFS administrators are taking steps like: > - Manually rebalancing blocks in storage directories > - Decomissioning nodes & later readding them > There's a tradeoff between making use of all available spindles, and filling disks at the sameish rate. Possible solutions include: > - Weighting less-used disks heavier when placing new blocks on the datanode. In write-heavy environments this will still make use of all spindles, equalizing disk use over time. > - Rebalancing blocks locally. This would help equalize disk use as disks are added/replaced in older cluster nodes. > Datanodes should actively manage their local disk so operator intervention is not needed. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira