hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hong Tang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6592) Scheduler: Pause button desirable
Date Tue, 23 Feb 2010 09:27:27 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837163#action_12837163

Hong Tang commented on HADOOP-6592:

bq. Isnt this a duplicate of MAPREDUCE-1227? 
No, like Todd mentioned, MAPREDUCE-1227 is for cluster-wide pause. This one is to pause individual

There are cases where pausing individual jobs could be desirable. For instance, I am running
a very large job, toward the end I found out some tasks start to fail due to disk quota limit
reached. I'd like to pause the job, free up some quota space, and resume the job. Pausing
a job may also be useful for debugging. 

bq. Unfortunately people do not realize that this has significant (negative) consequences
to the cluster; in particular, map-outputs consume valuable temporary storage and make this
feature un-viable for Map-Reduce.
Agreed. However, the current implementation does not enforce an absolute time limit of tasks,
so the danger of a job holding up temporary storage is still there if some of its tasks are
very slow. Perhaps we should meter a job's temporary storage usage as the product of data
volume and the duration (unit: "MBxMin"), and set a limit on that. This meter would still
be running even if the job is paused, so tasks or the entire job could be killed if its temp
storage usage quota is reached.

> Scheduler: Pause button desirable
> ---------------------------------
>                 Key: HADOOP-6592
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6592
>             Project: Hadoop Common
>          Issue Type: Wish
>            Reporter: Adam Kramer
>            Priority: Minor
> It would be lovely if, from the jobtracker page, I could click a button that's not "kill"
or "fail" but ..."pause."
> The pause button would stop a certain task from starting any more mappers or reducers.
They would all wait in the "pending" stage until the job is "un-paused." Currently-running
tasks would continue to run, and then complete, thus freeing the resources for other jobs.
> This would help a lot for systems (esp. Hive) in which one or two jobs are hogging a
lot of mappers or reducers. The ones they have would finish, and then other jobs could "catch
up," and then they could be unpaused for a while. This would also allow for user-level throttling
of their jobs in instances where they need a lot of resources but have the time to spare.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message