spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun Achuthan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-21303) Web-UI shows some Jobs get stuck randomly and stays like that. Neither able to kill
Date Wed, 12 Jul 2017 03:20:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-21303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Arun Achuthan updated SPARK-21303:
----------------------------------
    Attachment: Executors-2017-07-11 at 6.44.12 PM.png
                Persist Incoming Event Streams - Thread dump for executor 0.html
                Persist Incoming Event Streams - Thread dump for executor 1.html
                Persist Incoming Event Streams - Thread dump for executor 2.html
                Persist Incoming Event Streams - Thread dump for executor 3.html
                Persist Incoming Event Streams - Thread dump for executor 4.html
                Streaming-2017-07-11 at 6.51.14 PM.png

> Web-UI shows some Jobs get stuck randomly and stays like that. Neither able to kill
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-21303
>                 URL: https://issues.apache.org/jira/browse/SPARK-21303
>             Project: Spark
>          Issue Type: Bug
>          Components: DStreams
>    Affects Versions: 2.1.0, 2.1.1
>         Environment: Kubernetes 1.4.12 on AWS 
> OS Ubuntu
> Spark 2.1.1
> Cassandra 3.9
>            Reporter: Arun Achuthan
>         Attachments: Executors-2017-07-11 at 6.44.12 PM.png, Persist Incoming Event Streams
- Thread dump for executor 0.html, Persist Incoming Event Streams - Thread dump for executor
1.html, Persist Incoming Event Streams - Thread dump for executor 2.html, Persist Incoming
Event Streams - Thread dump for executor 3.html, Persist Incoming Event Streams - Thread dump
for executor 4.html, Streaming-2017-07-11 at 6.51.14 PM.png
>
>
> We are running a streaming application which was running without any issues for long.
Last few days we are seeing some jobs randomly getting stuck on the web ui.  This doesn't
stop the application as the  following jobs are successful. The stuck jobs remain in the web-ui
as stuck with no progress. These are the observations we made.  At the time the first job
is shown stuck on UI  the driver logs  mention this
> 2017-07-04 05:33:20,189 ERROR [dag-scheduler-event-loop] org.apache.spark.scheduler.LiveListenerBus:
Dropping SparkListenerEvent because no remaining room in event queue. This likely means one
of the SparkListeners is too slow and cannot keep up with the rate at which tasks are being
started by the scheduler.
> For every other random stuck job  the driver logs mention  the below at the same time
> 2017-07-04 05:33:20,194 WARN [dispatcher-event-loop-0] org.apache.spark.scheduler.LiveListenerBus:
Dropped 1 SparkListenerEvents since Thu Jan 01 00:00:00 UTC 1970
>  
> 2017-07-04 05:49:31,571 WARN [dag-scheduler-event-loop] org.apache.spark.scheduler.LiveListenerBus:
Dropped 1 SparkListenerEvents since Tue Jul 04 05:34:20 UTC 2017
> After  the jobs starts getting stuck  we are experiencing performance  drops as well
as scheduling delays within the application. We couldn't find any other significant errors
in the driver logs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message