spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-22765) Create a new executor allocation scheme based on that of MR
Date Wed, 20 Dec 2017 05:43:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297914#comment-16297914
] 

Xuefu Zhang commented on SPARK-22765:
-------------------------------------

Alright, I tested upfront allocation and its combinations with other improvement ideas and
here is what I found:

1. the query: just one query, fairly complicated, represented by one of the main spark jobs:
{code}
Status: Running (Hive on Spark job[4])
--------------------------------------------------------------------------------------
          STAGES   ATTEMPT        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
--------------------------------------------------------------------------------------
Stage-10 .......         0      FINISHED    339        339        0        0       0  
Stage-11 .......         0      FINISHED    201        201        0        0       0  
Stage-12 .......         0      FINISHED    191        191        0        0       0  
Stage-13 .......         0      FINISHED    178        178        0        0       0  
Stage-14 .......         0      FINISHED    115        115        0        0       0  
Stage-15 .......         0      FINISHED    105        105        0        0       0  
Stage-16 .......         0      FINISHED    592        592        0        0       0  
Stage-17 .......         0      FINISHED    191        191        0        0       0  
Stage-4 ........         0      FINISHED    178        178        0        0       0  
Stage-5 ........         0      FINISHED    115        115        0        0       0  
Stage-6 ........         0      FINISHED    105        105        0        0       0  
Stage-7 ........         0      FINISHED    339        339        0        0       0  
Stage-8 ........         0      FINISHED    201        201        0        0       0  
Stage-9 ........         0      FINISHED    191        191        0        0       0  
--------------------------------------------------------------------------------------
{code}
2. Without any improvement, default 60s idleTime, Spark uses more than 3X resources compared
to MR.
3. With idleTime=5s and the improvement in SPARK-21656, Spark uses about 2X resources.
4. Same as #3, but with idleTime=1s, Spark uses 1.4X resoures
5. Same as #3, but with additional upfront allocation Spark also uses 1.4X resource
6. Same as #4, but with additional improvement in SPARK-22683 (factor = 2), Spark uses 1.2X
resource
7. Same as #6 but with factor=3, Spark uses 0.8X resources.

While this is just for one query, far from being conclusive, we can really see that how much
those considerations might impact efficiency. I'm sure the mileage varies, but this at least
shows there is a lot of room for Spark to improve resource utilization efficiency wrt scheduling.

> Create a new executor allocation scheme based on that of MR
> -----------------------------------------------------------
>
>                 Key: SPARK-22765
>                 URL: https://issues.apache.org/jira/browse/SPARK-22765
>             Project: Spark
>          Issue Type: Improvement
>          Components: Scheduler
>    Affects Versions: 1.6.0
>            Reporter: Xuefu Zhang
>
> Many users migrating their workload from MR to Spark find a significant resource consumption
hike (i.e, SPARK-22683). While this might not be a concern for users that are more performance
centric, for others conscious about cost, such hike creates a migration obstacle. This situation
can get worse as more users are moving to cloud.
> Dynamic allocation make it possible for Spark to be deployed in multi-tenant environment.
With its performance-centric design, its inefficiency has also unfortunately shown up, especially
when compared with MR. Thus, it's believed that MR-styled scheduler still has its merit. Based
on our research, the inefficiency associated with dynamic allocation comes in many aspects
such as executor idling out, bigger executors, many stages (rather than 2 stages only in MR)
in a spark job, etc.
> Rather than fine tuning dynamic allocation for efficiency, the proposal here is to add
a new, efficiency-centric  scheduling scheme based on that of MR. Such a MR-based scheme can
be further enhanced and be more adapted to Spark execution model. This alternative is expected
to offer good performance improvement (compared to MR) still with similar to or even better
efficiency than MR.
> Inputs are greatly welcome!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message