Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 93892 invoked from network); 12 Mar 2008 14:12:21 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 12 Mar 2008 14:12:21 -0000 Received: (qmail 69621 invoked by uid 500); 12 Mar 2008 14:12:15 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 69579 invoked by uid 500); 12 Mar 2008 14:12:14 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 69554 invoked by uid 99); 12 Mar 2008 14:12:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Mar 2008 07:12:14 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Mar 2008 14:11:46 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id F1600234C09A for ; Wed, 12 Mar 2008 07:10:46 -0700 (PDT) Message-ID: <402237923.1205331046987.JavaMail.jira@brutus> Date: Wed, 12 Mar 2008 07:10:46 -0700 (PDT) From: "Allen Wittenauer (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Updated: (HADOOP-2150) dfs.data.dir syntax needs revamping: multiple percentages and weights MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-2150: ------------------------------------- Description: Currently, all filesystems listed in the dfs.data.dir are treated the same with respected to the space reservation percentages. This makes sense on homogeneous, dedicated machines, but breaks badly on heterogeneous ones and creates a bit of a support nightmare. In a grid with multiple disk sizes, the admin is either leaving space unallocated or is required to slice up the disk. In addition, if Hadoop isn't the only application running, there may be unexpected collisions. In order to work around this limitation, the administrator must specifically partition up filesystem space such that the reservation 'make sense' for all of the configured file systems. For example, if someone has 2 small file systems and 2 big ones on a single machine, due to various requirements (such as the OS being mirrored, systems were built from spare parts, server consolidation, whatever). Reserving 10% might make sense on the small file systems (say 7G) but 10% may leave a lot more space than desired free on the big ones (say 50G). Instead, Hadoop should support a more robust syntax for directory layout. Ideally, an admin should be able to specify the directory location, the amount of space reserved (in either a percentage or a raw size syntax) for HDFS, as well as a weighting such that some file systems may be preferred over others. In the example above, the two larger file systems would likely be preferred over the two smaller ones. Additionally, the reservation on the larger file system might be changed such that it matches the 7G on the smaller file system. Doing so would allow for much more complex configuration scenarios without having to shuffle a lot of things around at the operating system level. was: Currently, all filesystems listed in the dfs.data.dir are treated the same with respected to the space reservation percentages. This makes sense on homogeneous, dedicated machines, but breaks badly on heterogeneous ones and creates a bit of a support nightmare. In a grid with multiple disk sizes, the admin is either leaving space unallocated or is required to slice up the disk. In addition, if Hadoop isn't the only application running, there may be unexpected collisions. In order to work around this limitation, the administrator must specifically partition up filesystem space such that the reservation 'make sense' for all of the configured filesystems. For example, if someone has 2 small file systems and 2 big ones on a single machine, due to various requirements (such as the OS being mirrored, systems were built from spare parts, server consolidation, whatever). Reserving 10% might make sense on the small file systems (say 7G) but 10% may leave a lot more space than desired free on the big ones (say 50G). Instead, Hadoop should support a more robust syntax for directory layout. Ideally, an admin should be able to specify the directory location, the amount of space reserved--in either a percentage or a raw size syntax--for HDFS, as well as a weighting such that some file systems may be preferred over others. In the example above, the two larger file systems would likely be preferred over the two smaller ones. Additionally, the reservation on the larger file system might be changed such that it matches the 7G on the smaller file system. Doing so would allow for much more complex configuration scenarios without having to shuffle a lot of things around at the operating system level. > dfs.data.dir syntax needs revamping: multiple percentages and weights > --------------------------------------------------------------------- > > Key: HADOOP-2150 > URL: https://issues.apache.org/jira/browse/HADOOP-2150 > Project: Hadoop Core > Issue Type: Improvement > Components: dfs > Environment: This is likely a cross-platform issue. > Reporter: Allen Wittenauer > Priority: Minor > > Currently, all filesystems listed in the dfs.data.dir are treated the same with respected to the space reservation percentages. This makes sense on homogeneous, dedicated machines, but breaks badly on heterogeneous ones and creates a bit of a support nightmare. > In a grid with multiple disk sizes, the admin is either leaving space unallocated or is required to slice up the disk. In addition, if Hadoop isn't the only application running, there may be unexpected collisions. In order to work around this limitation, the administrator must specifically partition up filesystem space such that the reservation 'make sense' for all of the configured file systems. For example, if someone has 2 small file systems and 2 big ones on a single machine, due to various requirements (such as the OS being mirrored, systems were built from spare parts, server consolidation, whatever). Reserving 10% might make sense on the small file systems (say 7G) but 10% may leave a lot more space than desired free on the big ones (say 50G). > Instead, Hadoop should support a more robust syntax for directory layout. Ideally, an admin should be able to specify the directory location, the amount of space reserved (in either a percentage or a raw size syntax) for HDFS, as well as a weighting such that some file systems may be preferred over others. In the example above, the two larger file systems would likely be preferred over the two smaller ones. Additionally, the reservation on the larger file system might be changed such that it matches the 7G on the smaller file system. > Doing so would allow for much more complex configuration scenarios without having to shuffle a lot of things around at the operating system level. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.