spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-20657) Speed up Stage page
Date Mon, 18 Dec 2017 22:06:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apache Spark reassigned SPARK-20657:
------------------------------------

    Assignee:     (was: Apache Spark)

> Speed up Stage page
> -------------------
>
>                 Key: SPARK-20657
>                 URL: https://issues.apache.org/jira/browse/SPARK-20657
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Web UI
>    Affects Versions: 2.3.0
>            Reporter: Marcelo Vanzin
>
> The Stage page in the UI is very slow when a large number of tasks exist (tens of thousands).
The new work being done in SPARK-18085 makes that worse, since it adds potential disk access
to the mix.
> A lot of the slowness is because the code loads all the tasks in memory then sorts a
really large list, and does a lot of calculations on all the data; both can be avoided with
the new app state store by having smarter indices (so data is read from the store sorted in
the desired order) and by keeping statistics about metrics pre-calculated (instead of re-doing
that on every page access).
> Then only the tasks on the current page (100 items by default) need to actually be loaded.
This also saves a lot on memory usage, not just CPU time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message