spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ari Gesher (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-19764) Executors hang with supposedly running task that are really finished.
Date Fri, 03 Mar 2017 14:16:45 GMT

     [ https://issues.apache.org/jira/browse/SPARK-19764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ari Gesher updated SPARK-19764:
-------------------------------

There's nothing output in the driver. It just appears hung.


> Executors hang with supposedly running task that are really finished.
> ---------------------------------------------------------------------
>
>                 Key: SPARK-19764
>                 URL: https://issues.apache.org/jira/browse/SPARK-19764
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, Spark Core
>    Affects Versions: 2.0.2
>         Environment: Ubuntu 16.04 LTS
> OpenJDK Runtime Environment (build 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13)
> Spark 2.0.2 - Spark Cluster Manager
>            Reporter: Ari Gesher
>         Attachments: driver-log-stderr.log, executor-2.log, netty-6153.jpg, SPARK-19764.tgz
>
>
> We've come across a job that won't finish.  Running on a six-node cluster, each of the
executors end up with 5-7 tasks that are never marked as completed.
> Here's an excerpt from the web UI:
> ||Index  ▴||ID||Attempt||Status||Locality Level||Executor ID / Host||Launch Time||Duration||Scheduler
Delay||Task Deserialization Time||GC Time||Result Serialization Time||Getting Result Time||Peak
Execution Memory||Shuffle Read Size / Records||Errors||
> |105	| 1131	| 0	| SUCCESS	|PROCESS_LOCAL	|4 / 172.31.24.171 |	2017/02/27 22:51:36 |	1.9
min |	9 ms |	4 ms |	0.7 s |	2 ms|	6 ms|	384.1 MB| 	90.3 MB / 572	| |
> |106|	1168|	0|	RUNNING	|ANY|	2 / 172.31.16.112|	2017/02/27 22:53:25|	6.5 h	|0 ms|	0 ms|
1 s	|0 ms|	0 ms|	|384.1 MB	|98.7 MB / 624 | |	
> However, the Executor reports the task as finished: 
> {noformat}
> 17/02/27 22:53:25 INFO Executor: Running task 106.0 in stage 5.0 (TID 1168)
> 17/02/27 22:55:29 INFO Executor: Finished task 106.0 in stage 5.0 (TID 1168). 2633558
bytes result sent via BlockManager)
> {noformat}
> As does the driver log:
> {noformat}
> 17/02/27 22:53:25 INFO Executor: Running task 106.0 in stage 5.0 (TID 1168)
> 17/02/27 22:55:29 INFO Executor: Finished task 106.0 in stage 5.0 (TID 1168). 2633558
bytes result sent via BlockManager)
> {noformat}
> Full log from this executor and the {{stderr}} from {{app-20170227223614-0001/2/stderr}}
attached.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message