spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rong Tang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-25409) Speed up Spark History at start if there are tens of thousands of applications.
Date Fri, 21 Sep 2018 17:29:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-25409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16623919#comment-16623919
] 

Rong Tang commented on SPARK-25409:
-----------------------------------

Close this JIRA. As  https://issues.apache.org/jira/browse/SPARK-6951 has done a great
job speeding up loading.

 

> Speed up Spark History at start if there are tens of thousands of applications.
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-25409
>                 URL: https://issues.apache.org/jira/browse/SPARK-25409
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.3.1
>            Reporter: Rong Tang
>            Priority: Major
>         Attachments: SPARK-25409.0001.patch
>
>
> We have a spark history server, storing 7 days' applications. it usually has 10K to 20K
attempts.
> We found that it can take hours at start up,loading/replaying the logs in event-logs
folder.  thus, new finished applications have to wait several hours to be seem. So I made
2 improvements for it.
>  # As we run spark on yarn. the on-going applications' information can also be seen via
resource manager, so I introduce in a flag spark.history.fs.load.incomplete to say loading
logs for incomplete attempts or not.
>  # Incremental loading applications. as I said, we have more then 10K applications stored,
it can take hours to load all of them at the first time. so I introduced in a config spark.history.fs.appRefreshNum
to say how many application to load each time, then it gets a chance the check the latest
updates.
> Here are the benchmark I did.  our system has 1K incomplete application ( it was not
cleaned up for some reason, that is another issue that I need investigate), and applications'
log size can be gigabytes. 
>  
> Not load incomplete attempts.
> | |Load Count|Load incomplete APPs|All attempts number|Time Cost|Increase with more
attempts|
> |1 ( current implementation)|All|Yes|13K|2 hours 14 minutes|Yes|
> |2|All|No|13K|31 minutes| yes|
>  
>  
> Limit each time how much to load.
>  
> | |Load Count|Load incomplete APPs|All attempts number|Worst Cost|Increase with more
attempts|
> |1 ( current implementation)|All|Yes|13K|2 hours 14 minutes|Yes|
> |2|3000|Yes|13K|42 minutes except last 1.6K
> (The last 1.6K attempts cost extremely long 2.5 hours)|NO|
>  
>  
> Limit each time how many to load, and not load incomplete jobs.
>  
> | |Load Count|Load incomplete APPs|All attempts number|Worst Cost|Avg|Increase with
more attempts|
> |1 ( current implementation)|All|Yes|13K|2 hours 14 minutes| |Yes|
> |2|3000|NO|12K|17minutes
>  |10 minutes
> ( 41 minutes in total)|NO|
>  
>  
> | |Load Count|Load incomplete APPs|All attempts number|Worst Cost|Avg|Increase with
more attempts|
> |1 ( current implementation)|All|Yes|20K|1 hour 52 minutes| |Yes|
> |2|3000|NO|18.5K|20minutes|18 minutes
> (2 hours 18 minutes in total)
>  |NO|
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message