hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Smith (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-284) dfs.data.dir syntax needs revamping: multiple percentages and weights
Date Fri, 25 Sep 2009 02:35:16 GMT

    [ https://issues.apache.org/jira/browse/HDFS-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759358#action_12759358

Paul Smith commented on HDFS-284:

I think this could be one of these "If we build it, they will come" issues.  most of the Hadoop
committers are working in large scale homogenous environments (lucky them).  They are probably
not against this, but probably so far down their Care Factor scale it's just getting ignored.

For us fine folk who desire it, perhaps a custom patch that can be shared, and applied by
hand (perhaps someone can do a custom build and say "we can use the one produced here") may
then provide further local-community noise and some confidence that the change is being used
in some 'real world' systems without problems.  Perhaps then the patch will be accepted.

In summary, I wouldn't wait for the committers.

> dfs.data.dir syntax needs revamping: multiple percentages and weights
> ---------------------------------------------------------------------
>                 Key: HDFS-284
>                 URL: https://issues.apache.org/jira/browse/HDFS-284
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>         Environment: This is likely a cross-platform issue.
>            Reporter: Allen Wittenauer
>            Priority: Minor
> Currently, all filesystems listed in the dfs.data.dir are treated the same with respected
to the space reservation percentages.  This makes sense on homogeneous, dedicated machines,
but breaks badly on heterogeneous ones and creates a bit of a support nightmare. 
> In a grid with multiple disk sizes, the admin is either leaving space unallocated or
is required to slice up the disk.  In addition, if Hadoop isn't the only application running,
there may be unexpected collisions. In order to work around this limitation, the administrator
must specifically partition up filesystem space such that the reservation 'make sense' for
all of the configured file systems.   For example, if someone has 2 small file systems and
2 big ones on a single machine, due to various requirements (such as the OS being mirrored,
systems were built from spare parts, server consolidation, whatever).   Reserving 10% might
make sense on the small file systems  (say 7G) but 10% may leave a lot more space than desired
free on the big ones (say 50G).  
> Instead, Hadoop should support a more robust syntax for directory layout.  Ideally, an
admin should be able to specify the directory location, the amount of space reserved (in either
a percentage or a raw size syntax) for HDFS, as well as a weighting such that some file systems
may be preferred over others.  In the example above, the two larger file systems would likely
be preferred over the two smaller ones.  Additionally, the reservation on the larger file
system might be changed such that it matches the 7G on the smaller file system.
> Doing so would allow for much more complex configuration scenarios without having to
shuffle a lot of things around at the operating system level.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message