Mailing-List: contact issues-help@spark.apache.org; run by ezmlm
Precedence: bulk
Date: Fri, 2 Jun 2017 22:19:04 +0000 (UTC)
From: "Marcelo Vanzin (JIRA)" <jira@apache.org>
To: issues@spark.apache.org
Message-ID: <JIRA.13070178.1494278711000.356947.1496441944207@Atlassian.JIRA>
In-Reply-To: <JIRA.13070178.1494278711000@Atlassian.JIRA>
References: <JIRA.13070178.1494278711000@Atlassian.JIRA> <JIRA.13070178.1494278711504@jira-lw-us.apache.org>
Subject: [jira] [Commented] (SPARK-20662) Block jobs that have greater than
 a configured number of tasks
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Fri, 02 Jun 2017 22:19:09 -0000


    [ https://issues.apache.org/jira/browse/SPARK-20662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035525#comment-16035525 ] 

Marcelo Vanzin commented on SPARK-20662:
----------------------------------------

bq. For multiple users in an enterprise deployment, it's good to provide admin knobs. In this case, an admin just wanted to block bad jobs.

Your definition of a bad job is the problem (well, one of the problems). "Number of tasks" is not an indication that a job is large. Each task may be really small.

Spark shouldn't be in the job of defining what is a good or bad job, and that doesn't mean it's targeted at single user vs. multi user environments. It's just something that needs to be controlled at a different layer. If the admin is really worried about resource usage, he has control over the RM, and shouldn't rely on applications behaving nicely to enforce those controls. Applications misbehave. Users mess with configuration. Those are all things outside of the admin's control.

> Block jobs that have greater than a configured number of tasks
> --------------------------------------------------------------
>
>                 Key: SPARK-20662
>                 URL: https://issues.apache.org/jira/browse/SPARK-20662
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.6.0, 2.0.0
>            Reporter: Xuefu Zhang
>
> In a shared cluster, it's desirable for an admin to block large Spark jobs. While there might not be a single metrics defining the size of a job, the number of tasks is usually a good indicator. Thus, it would be useful for Spark scheduler to block a job whose number of tasks reaches a configured limit. By default, the limit could be just infinite, to retain the existing behavior.
> MapReduce has mapreduce.job.max.map and mapreduce.job.max.reduce to be configured, which blocks a MR job at job submission time.
> The proposed configuration is spark.job.max.tasks with a default value -1 (infinite).


--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org