hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@hortonworks.com>
Subject Re: quotas for size of intermediate map/reduce output?
Date Wed, 21 Sep 2011 23:36:23 GMT
We do track intermediate output used and if a job is using too much and can't be scheduled
anywhere on a cluster the CS/JT will fail it. You'll need hadoop-0.20.204 for this though.

Also, with MRv2 we are in the process of adding limits on disk usage for intermediate outputs,
logs etc.

hth,
Arun

On Sep 21, 2011, at 3:45 PM, Matt Steele wrote:

> Hi All,
> 
> Is it possible to enforce a maximum to the disk space consumed by a map/reduce job's
intermediate output?  It looks like you can impose limits on hdfs consumption, or, via the
capacity scheduler, limits on the RAM that a map/reduce slot uses, or the number of slots
used.
> 
> But if I'm worried that a job might exhaust the cluster's disk capacity during the shuffle,
my sense is that I'd have to quarantine the job on a separate cluster.  Am I wrong?  Do you
have any suggestions for me?
> 
> Thanks,
> Matt


Mime
View raw message