spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcelo Vanzin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-20662) Block jobs that have greater than a configured number of tasks
Date Fri, 02 Jun 2017 22:19:04 GMT

    [ https://issues.apache.org/jira/browse/SPARK-20662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035525#comment-16035525
] 

Marcelo Vanzin commented on SPARK-20662:
----------------------------------------

bq. For multiple users in an enterprise deployment, it's good to provide admin knobs. In this
case, an admin just wanted to block bad jobs.

Your definition of a bad job is the problem (well, one of the problems). "Number of tasks"
is not an indication that a job is large. Each task may be really small.

Spark shouldn't be in the job of defining what is a good or bad job, and that doesn't mean
it's targeted at single user vs. multi user environments. It's just something that needs to
be controlled at a different layer. If the admin is really worried about resource usage, he
has control over the RM, and shouldn't rely on applications behaving nicely to enforce those
controls. Applications misbehave. Users mess with configuration. Those are all things outside
of the admin's control.

> Block jobs that have greater than a configured number of tasks
> --------------------------------------------------------------
>
>                 Key: SPARK-20662
>                 URL: https://issues.apache.org/jira/browse/SPARK-20662
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.6.0, 2.0.0
>            Reporter: Xuefu Zhang
>
> In a shared cluster, it's desirable for an admin to block large Spark jobs. While there
might not be a single metrics defining the size of a job, the number of tasks is usually a
good indicator. Thus, it would be useful for Spark scheduler to block a job whose number of
tasks reaches a configured limit. By default, the limit could be just infinite, to retain
the existing behavior.
> MapReduce has mapreduce.job.max.map and mapreduce.job.max.reduce to be configured, which
blocks a MR job at job submission time.
> The proposed configuration is spark.job.max.tasks with a default value -1 (infinite).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message