hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-16552) Limit the number of tasks a Spark job may contain
Date Tue, 02 May 2017 03:54:04 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15992296#comment-15992296
] 

Xuefu Zhang commented on HIVE-16552:
------------------------------------

[~lirui], as mentioned in the description, the main use of this property is to block large/bad
queries that taking a lot of resources, such as scanning a lot of partitions. YARN resource
settings doesn't prevent users from submitting such a large query. MR has things like mapreduce.job.max.map,
whereas Spark doesn't provide such options.

Large/bad queries not just run longer but also creates huge load on HS2 and HDFS. This option
provides an admin to control such queries.

Regular users don't have to worry about this configuration. They just need to rewrite their
blocked queries. It's advisable for an admin to blacklist this configuration.

Also, for admins or regular users who don't have a such problem, the default value will just
do for them.

Make sense?

> Limit the number of tasks a Spark job may contain
> -------------------------------------------------
>
>                 Key: HIVE-16552
>                 URL: https://issues.apache.org/jira/browse/HIVE-16552
>             Project: Hive
>          Issue Type: Improvement
>          Components: Spark
>    Affects Versions: 1.0.0, 2.0.0
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>         Attachments: HIVE-16552.1.patch, HIVE-16552.patch
>
>
> It's commonly desirable to block bad and big queries that takes a lot of YARN resources.
One approach, similar to mapreduce.job.max.map in MapReduce, is to stop a query that invokes
a Spark job that contains too many tasks. The proposal here is to introduce hive.spark.job.max.tasks
with a default value of -1 (no limit), which an admin can set to block queries that trigger
too many spark tasks.
> Please note that this control knob applies to a spark job, though it's possible that
one query can trigger multiple Spark jobs (such as in case of map-join). Nevertheless, the
proposed approach is still helpful.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message