spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-25837) Web UI does not respect spark.ui.retainedJobs in some instances
Date Mon, 29 Oct 2018 20:26:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-25837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apache Spark reassigned SPARK-25837:
------------------------------------

    Assignee: Apache Spark

> Web UI does not respect spark.ui.retainedJobs in some instances
> ---------------------------------------------------------------
>
>                 Key: SPARK-25837
>                 URL: https://issues.apache.org/jira/browse/SPARK-25837
>             Project: Spark
>          Issue Type: Bug
>          Components: Web UI
>    Affects Versions: 2.3.1
>         Environment: Reproduction Environment:
> Spark 2.3.1
> Dataproc 1.3-deb9
> 1x master 4 vCPUs, 15 GB
> 2x workers 4 vCPUs, 15 GB
>  
>            Reporter: Patrick Brown
>            Assignee: Apache Spark
>            Priority: Minor
>         Attachments: Screen Shot 2018-10-23 at 4.40.51 PM (1).png
>
>
> Expected Behavior: Web UI only displays 1 completed job and remains responsive.
> Actual Behavior: Both during job execution and following all job completion for some
non short amount of time the UI retains many completed jobs, causing limited responsiveness.
>  
> To reproduce:
>  
>  > spark-shell --conf spark.ui.retainedJobs=1
>   
>  scala> import scala.concurrent._
>  scala> import scala.concurrent.ExecutionContext.Implicits.global
>  scala> for (i <- 0 until 50000) { Future
> { println(sc.parallelize(0 until i).collect.length) }
> }
>   
>  
>  
> The attached screenshot shows the state of the webui after running the repro code, you
can see the ui is displaying some 43k completed jobs (takes a long time to load) after a few
minutes of inactivity this will clear out, however in an application which continues to submit
jobs every once in a while, the issue persists.
>  
> The issue seems to appear when running multiple jobs at once as well as in sequence for
a while and may as well have something to do with high master CPU usage (thus the collect
in the repro code). My rough guess would be whatever is managing clearing out completed jobs
gets overwhelmed (on the master during repro htop reported almost full CPU usage across all
4 cores).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message