Hello Paul here is quick answer to your question -
You can use dfs.datanode.du.pct and dfs.datanode.du.reserved property in hdfs-site.xml config
file to configure
maximum local disk space used by hdfs and mapreduce.
<property>
<name>dfs.datanode.du.pct</name>
<value>0.85f</value>
<description>When calculating remaining space, only use this percentage of the real
available space
</description>
</property>
<property>
<name>dfs.datanode.du.reserved</name>
<value>1070000</value>
<description>Reserved space in bytes per volume. Always leave this much space free
for non dfs use.
</description>
</property>
For more information please refer - http://hadoop.apache.org/common/docs/r0.20.0/cluster_setup.html#Configuration+Files
Thanks,
-
Ravi
On 9/22/09 5:38 PM, "Paul Smith" <psmith@aconex.com> wrote:
Hi, I recognize that Hadoop has built in quotas for directories inside
HDFS, and that one can configure the 'dfs.data.dir' property to
specify the paths to use on a local node for DFS blocks, but I have a
couple of questions regarding setting up a trial Hadoop cluster for
R&D purposes that utilises our existing Engineering team's local
desktop computers together with a few server-quality machines we have
in teh office. This is a throw away cluster used for nothing but
running training, tests, experiments etc. I've successfully set this
up across 12 nodes, but I've run into some logistical problems. Each
computer in the cluster is already doing something else but has spare
CPU cycles and disk space that could be useful for Hadoop.
Firstly, each Engineer has different disk spaces available, which is
fine, because I could create a '/home/hadoop/disk1' directory on each
one and ensure that it's either a symlink to some other directory on a
volume that has space, or that it's just a real directory where the /
home volume is sitting. however it is still possible to fill up this
volume, and the local Engineers computer can get in a weird state when
that disk fills up (originally I had the default config that used /
tmp, which caused a bit of havoc initially, whoops). I could probably
poke around and find a volume on each node that won't affect the local
computer if it fills up, but that might not be a good idea (the one
volume that could be filled up without affect is probably a tiny volume)
I was wondering whether anyone had any ideas. I sort of need a local-
node quota system ("This node should use no more than XGb"). I was
initially investigating using disk quotas at the Unix filesystem
level, but thought I'd ask before I went down that path in case
someone else had a much better idea.
Obviously this is only useful for test clusters, in a real world setup
the manageability of it simply wouldn't scale beyond a few handfuls of
nodes, but this would allow me to setup a reasonable-sized cluster for
some good experiments without clobbering existing processes and work
that are being done.
cheers,
Paul Smith
Ravi
--
|