hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Steele <rmattste...@gmail.com>
Subject Re: quotas for size of intermediate map/reduce output?
Date Wed, 21 Sep 2011 23:59:19 GMT
Thanks for this info; it sounds like we should upgrade to 0.20.204.

If more than one job is running when the cluster loses its ability to
schedule new tasks due to insufficient disk space, do you know what logic
the jobtracker uses to decide which job to kill?


On Wed, Sep 21, 2011 at 4:36 PM, Arun C Murthy <acm@hortonworks.com> wrote:

> We do track intermediate output used and if a job is using too much and
> can't be scheduled anywhere on a cluster the CS/JT will fail it. You'll need
> hadoop-0.20.204 for this though.
> Also, with MRv2 we are in the process of adding limits on disk usage for
> intermediate outputs, logs etc.
> hth,
> Arun
> On Sep 21, 2011, at 3:45 PM, Matt Steele wrote:
> > Hi All,
> >
> > Is it possible to enforce a maximum to the disk space consumed by a
> map/reduce job's intermediate output?  It looks like you can impose limits
> on hdfs consumption, or, via the capacity scheduler, limits on the RAM that
> a map/reduce slot uses, or the number of slots used.
> >
> > But if I'm worried that a job might exhaust the cluster's disk capacity
> during the shuffle, my sense is that I'd have to quarantine the job on a
> separate cluster.  Am I wrong?  Do you have any suggestions for me?
> >
> > Thanks,
> > Matt

View raw message