hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod KV <vino...@yahoo-inc.com>
Subject Re: local node Quotas (for an R&D cluster)
Date Wed, 23 Sep 2009 06:08:26 GMT
Allen Wittenauer wrote:
>
> On 9/22/09 5:47 PM, "Ravi Phulari" <rphulari@yahoo-inc.com> wrote:
>
>   
>> Hello Paul here is quick answer to your question -
>> You can use dfs.datanode.du.pct  and dfs.datanode.du.reserved  property in
>> hdfs-site.xml config file to  configure
>> maximum  local disk space used by hdfs and mapreduce.
>>     
>
> No, that's incorrect.
>
> These values determine how much HDFS is *not* allowed to use.  There is no
> limit on how much MR can take.  This is exactly the opposite of what he and
> pretty much every other admin wants.  [Negative math is fun! Or something.]
>
> The only way to guarantee that HDFS and MR do not eat more space than you
> actually want is to create a separate file system.   In the case of the
> datandode,  potentially run the data node process with a file system quota
> at the Unix level.

I think it will be a useful feature in Map/Reduce to limit the disk 
usage of the daemons
to a configured value in the framework itself.
  - On TT, this limit can be used to restrict, for e.g., the disk usage 
by job localization/task
localization and map outputs and fail the task or reject new tasks if 
the limit is hit.
  - On JT, it can be used to reject new jobs if the local disk usage for 
job related
files crosses a threshold.

Will create a JIRA issue to see if these framework limits can be 
implemented.

Solving it in general in the framework to limit the usage of the tasks 
is not possible, for
e.g., tasks may write to arbitrary files on arbitrary disks/file systems 
on the TT.
The only solution in this case will be sandboxing/virtualization of each 
task.

Thanks,
+Vinod

Mime
View raw message