hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Smith <psm...@aconex.com>
Subject local node Quotas (for an R&D cluster)
Date Wed, 23 Sep 2009 00:38:16 GMT
Hi, I recognize that Hadoop has built in quotas for directories inside  
HDFS, and that one can configure the 'dfs.data.dir' property to  
specify the paths to use on a local node for DFS blocks, but I have a  
couple of questions regarding setting up a trial Hadoop cluster for  
R&D purposes that utilises our existing Engineering team's local  
desktop computers together with a few server-quality machines we have  
in teh office.  This is a throw away cluster used for nothing but  
running training, tests, experiments etc.  I've successfully set this  
up across 12 nodes, but I've run into some logistical problems.  Each  
computer in the cluster is already doing something else but has spare  
CPU cycles and disk space that could be useful for Hadoop.

Firstly, each Engineer has different disk spaces available, which is  
fine, because I could create a '/home/hadoop/disk1' directory on each  
one and ensure that it's either a symlink to some other directory on a  
volume that has space, or that it's just a real directory where the / 
home volume is sitting.  however it is still possible to fill up this  
volume, and the local Engineers computer can get in a weird state when  
that disk fills up (originally I had the default config that used / 
tmp, which caused a bit of havoc initially, whoops).  I could probably  
poke around and find a volume on each node that won't affect the local  
computer if it fills up, but that might not be a good idea (the one  
volume that could be filled up without affect is probably a tiny volume)

I was wondering whether anyone had any ideas.  I sort of need a local- 
node quota system ("This node should use no more than XGb").  I was  
initially investigating using disk quotas at the Unix filesystem  
level, but thought I'd ask before I went down that path in case  
someone else had a much better idea.

Obviously this is only useful for test clusters, in a real world setup  
the manageability of it simply wouldn't scale beyond a few handfuls of  
nodes, but this would allow me to setup a reasonable-sized cluster for  
some good experiments without clobbering existing processes and work  
that are being done.

cheers,

Paul Smith

Mime
View raw message