spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon King (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-19814) Spark History Server Out Of Memory / Extreme GC
Date Fri, 03 Mar 2017 19:59:45 GMT

     [ https://issues.apache.org/jira/browse/SPARK-19814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Simon King updated SPARK-19814:
-------------------------------
    Attachment: SparkHistoryCPUandRAM.png

Graph showing CPU usage (top) and RSS RAM (bottom). Note the one run of SHS in the middle
with lower max heap setting eventually spent much more CPU time on garbage collection.

> Spark History Server Out Of Memory / Extreme GC
> -----------------------------------------------
>
>                 Key: SPARK-19814
>                 URL: https://issues.apache.org/jira/browse/SPARK-19814
>             Project: Spark
>          Issue Type: Bug
>          Components: Web UI
>    Affects Versions: 1.6.1, 2.0.0, 2.1.0
>         Environment: Spark History Server (we've run it on several different Hadoop distributions)
>            Reporter: Simon King
>         Attachments: SparkHistoryCPUandRAM.png
>
>
> Spark History Server runs out of memory, gets into GC thrash and eventually becomes unresponsive.
This seems to happen more quickly with heavy use of the REST API. We've seen this with several
versions of Spark. 
> Running with the following settings (spark 2.1):
> spark.history.fs.cleaner.enabled    true
> spark.history.fs.cleaner.interval   1d
> spark.history.fs.cleaner.maxAge     7d
> spark.history.retainedApplications  500
> We will eventually get errors like:
> 17/02/25 05:02:19 WARN ServletHandler:ยท
> javax.servlet.ServletException: scala.MatchError: java.lang.OutOfMemoryError: GC overhead
limit exceeded (of class java.lang.OutOfMemoryError)
>   at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:489)
>   at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
>   at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
>   at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
>   at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
>   at org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
>   at org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
>   at org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>   at org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>   at org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>   at org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at org.spark_project.jetty.servlets.gzip.GzipHandler.handle(GzipHandler.java:529)
>   at org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>   at org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>   at org.spark_project.jetty.server.Server.handle(Server.java:499)
>   at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:311)
>   at org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
>   at org.spark_project.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
>   at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
>   at org.spark_project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: scala.MatchError: java.lang.OutOfMemoryError: GC overhead limit exceeded (of
class java.lang.OutOfMemoryError)
>   at org.apache.spark.deploy.history.ApplicationCache.getSparkUI(ApplicationCache.scala:148)
>   at org.apache.spark.deploy.history.HistoryServer.getSparkUI(HistoryServer.scala:110)
>   at org.apache.spark.status.api.v1.UIRoot$class.withSparkUI(ApiRootResource.scala:244)
>   at org.apache.spark.deploy.history.HistoryServer.withSparkUI(HistoryServer.scala:49)
>   at org.apache.spark.status.api.v1.ApiRootResource.getJobs(ApiRootResource.scala:66)
>   at sun.reflect.GeneratedMethodAccessor102.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.glassfish.jersey.server.internal.routing.SubResourceLocatorRouter$1.run(SubResourceLocatorRouter.java:158)
>   at org.glassfish.jersey.server.internal.routing.SubResourceLocatorRouter.getResource(SubResourceLocatorRouter.java:178)
>   at org.glassfish.jersey.server.internal.routing.SubResourceLocatorRouter.apply(SubResourceLocatorRouter.java:109)
>   at org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:109)
>   at org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112)
>   at org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112)
>   at org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112)
>   at org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112)
>   at org.glassfish.jersey.server.internal.routing.RoutingStage.apply(RoutingStage.java:92)
>   at org.glassfish.jersey.server.internal.routing.RoutingStage.apply(RoutingStage.java:61)
>   at org.glassfish.jersey.process.internal.Stages.process(Stages.java:197)
>   at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:318)
>   at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
>   at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
>   at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
>   at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
>   at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
>   at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
>   at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305)
>   at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
>   at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473)
> In our case we see memory usage gradually increase over perhaps 2 days, then level off
near max heap size (4G in our case), then often within 12-24 hours GC activity will start
to increase, and will result in more and more frequent errors, as in the stack trace above.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message