spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-22683) Allow tuning the number of dynamically allocated executors wrt task number
Date Tue, 12 Dec 2017 20:59:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-22683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288262#comment-16288262
] 

Xuefu Zhang commented on SPARK-22683:
-------------------------------------

Hi [~jcuquemelle], Thanks for working on this and bringing up the efficiency problem associated
with dynamic allocation. Significant resource consumption increase is also experienced in
our company when workload is migrated from MR to Spark (via Hive). Thus, I believe that there
is a strong need to improve spark efficiency in addition to performance.

While your proposal has its merit, I largely concur with Sean that it might not be universally
applicable to solve a class of problem rather than particular workload. Take MR as an example,
which also allocate as many mappers/reducers as the number of map or reduce tasks, yet offers
higher efficiency than Spark in many cases. The inefficiency associated with dynamic allocation
comes in many aspects such as executor idling out, bigger executors, many stages (rather than
2 stages only in MR) in a spark job, etc. As there is a class of users conscious about resource
consumption, especially when many moving their workload to the cloud, there demands a solution
that's more generic to such users.

I have been thinking about a proposal that introduces a MR-based resource allocation in parallel
with dynamic allocation. Such an allocation mechanism is based on MR style, but can be further
enhanced to beat MR and be more adapted to Spark execution model. This would be a great alternative
to dynamic allocation.

While dynamic is certainly performance centric, the new allocation scheme can still offer
good performance improvement (compared to MR) while being efficiency-centric.

As a start point, I'm going to create an JIRA and move the discussion along this proposal
over there. You're welcome to share your thoughts and/or contribute.

Thanks.

> Allow tuning the number of dynamically allocated executors wrt task number
> --------------------------------------------------------------------------
>
>                 Key: SPARK-22683
>                 URL: https://issues.apache.org/jira/browse/SPARK-22683
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.1.0, 2.2.0
>            Reporter: Julien Cuquemelle
>              Labels: pull-request-available
>
> let's say an executor has spark.executor.cores / spark.task.cpus taskSlots
> The current dynamic allocation policy allocates enough executors
> to have each taskSlot execute a single task, which minimizes latency, 
> but wastes resources when tasks are small regarding executor allocation
> overhead. 
> By adding the tasksPerExecutorSlot, it is made possible to specify how many tasks
> a single slot should ideally execute to mitigate the overhead of executor
> allocation.
> PR: https://github.com/apache/spark/pull/19881



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message