spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ingo Schuster (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-21176) Master UI hangs with spark.ui.reverseProxy=true if the master node has many CPUs
Date Thu, 22 Jun 2017 09:01:11 GMT

     [ https://issues.apache.org/jira/browse/SPARK-21176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ingo Schuster updated SPARK-21176:
----------------------------------
    Description: 
In reverse proxy mode, Sparks exhausts the Jetty thread pool if the master node has too many
cpus or the cluster has too many executers:

For each connector, Jetty creates Selector threads: minimum 4, maximum half the number of
available CPUs:
{{  selectors>0?selectors:Math.max(1,Math.min(4,Runtime.getRuntime().availableProcessors()/2)));}}
(see https://github.com/eclipse/jetty.project/blob/jetty-9.3.x/jetty-server/src/main/java/org/eclipse/jetty/server/ServerConnector.java)

In reverse proxy mode, a connector is set up for each executor and one for the master UI.
I have a system with 88 CPUs on the master node and 7 executors. Jetty tries to instantiate
8*44 = 352 selector threads, but since the QueuedThreadPool is initialized with 200 threads
by default, the UI gets stuck.

I have patched JettyUtils.scala to extend the thread pool ( {{val pool = new QueuedThreadPool*(400)*
}}). With this hack, the UI works.

Obviously, the Jetty defaults are meant for a real web server. If that has 88 CPUs, you do
certainly expect a lot of traffic.
For the Spark admin UI however, there will rarely be concurrent accesses for the same application
or the same executor.
I therefore propose to dramatically reduce the number of selector threads that get instantiated
- at least by default.

I will propose a fix in a pull request.

  was:
When *reverse proxy is enabled*
{quote}
spark.ui.reverseProxy=true
spark.ui.reverseProxyUrl=/
{quote}
 first of all any invocation of the spark master Web UI hangs forever locally (e.g. http://192.168.10.16:25001)
and via external URL without any data received. 
One, sometimes two spark applications succeed without error and than workers start throwing
exceptions:
{quote}
Caused by: java.io.IOException: Failed to connect to /192.168.10.16:25050
{quote}
The application dies during creation of SparkContext:
{quote}
2017-05-22 16:11:23 INFO  StandaloneAppClient$ClientEndpoint:54 - Connecting to master spark://node0101:25000...
2017-05-22 16:11:23 INFO  TransportClientFactory:254 - Successfully created connection to
node0101/192.168.10.16:25000 after 169 ms (132 ms spent in bootstraps)
2017-05-22 16:11:43 INFO  StandaloneAppClient$ClientEndpoint:54 - Connecting to master spark://node0101:25000...
2017-05-22 16:12:03 INFO  StandaloneAppClient$ClientEndpoint:54 - Connecting to master spark://node0101:25000...
2017-05-22 16:12:23 ERROR StandaloneSchedulerBackend:70 - Application has been killed. Reason:
All masters are unresponsive! Giving up.
2017-05-22 16:12:23 WARN  StandaloneSchedulerBackend:66 - Application ID is not initialized
yet.
2017-05-22 16:12:23 INFO  Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService'
on port 25056.
.....
Caused by: java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers
on a running MetricsSystem
{quote}

*This definitively does not happen without reverse proxy enabled!*


> Master UI hangs with spark.ui.reverseProxy=true if the master node has many CPUs
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-21176
>                 URL: https://issues.apache.org/jira/browse/SPARK-21176
>             Project: Spark
>          Issue Type: Bug
>          Components: Web UI
>    Affects Versions: 2.1.0, 2.1.1, 2.2.0, 2.2.1
>         Environment: ppc64le GNU/Linux, POWER8, only master node is reachable externally
other nodes are in an internal network
>            Reporter: Ingo Schuster
>              Labels: network, web-ui
>
> In reverse proxy mode, Sparks exhausts the Jetty thread pool if the master node has too
many cpus or the cluster has too many executers:
> For each connector, Jetty creates Selector threads: minimum 4, maximum half the number
of available CPUs:
> {{  selectors>0?selectors:Math.max(1,Math.min(4,Runtime.getRuntime().availableProcessors()/2)));}}
> (see https://github.com/eclipse/jetty.project/blob/jetty-9.3.x/jetty-server/src/main/java/org/eclipse/jetty/server/ServerConnector.java)
> In reverse proxy mode, a connector is set up for each executor and one for the master
UI.
> I have a system with 88 CPUs on the master node and 7 executors. Jetty tries to instantiate
8*44 = 352 selector threads, but since the QueuedThreadPool is initialized with 200 threads
by default, the UI gets stuck.
> I have patched JettyUtils.scala to extend the thread pool ( {{val pool = new QueuedThreadPool*(400)*
}}). With this hack, the UI works.
> Obviously, the Jetty defaults are meant for a real web server. If that has 88 CPUs, you
do certainly expect a lot of traffic.
> For the Spark admin UI however, there will rarely be concurrent accesses for the same
application or the same executor.
> I therefore propose to dramatically reduce the number of selector threads that get instantiated
- at least by default.
> I will propose a fix in a pull request.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message