spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Hamstra (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-2019) Spark workers die/disappear when job fails for nearly any reason
Date Wed, 04 Jun 2014 15:35:03 GMT

    [ https://issues.apache.org/jira/browse/SPARK-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017776#comment-14017776
] 

Mark Hamstra commented on SPARK-2019:
-------------------------------------

Please don't leave the Affects Version/s selector on None.  As with the SO question, is this
an issue that you are seeing with Spark 0.9.0?  If so, then the version of Spark that you
are using is significantly out of date even on the 0.9 branch.  Several bug fixes are present
in the 0.9.1 release of Spark, which has been available for almost two months.  There are
a few more in the current 0.9.2-SNAPSHOT code, and many more in the recent 1.0.0 release.

> Spark workers die/disappear when job fails for nearly any reason
> ----------------------------------------------------------------
>
>                 Key: SPARK-2019
>                 URL: https://issues.apache.org/jira/browse/SPARK-2019
>             Project: Spark
>          Issue Type: Bug
>            Reporter: sam
>
> We either have to reboot all the nodes, or run 'sudo service spark-worker restart' across
our cluster.  I don't think this should happen - the job failures are often not even that
bad.  There is a 5 upvoted SO question here: http://stackoverflow.com/questions/22031006/spark-0-9-0-worker-keeps-dying-in-standalone-mode-when-job-fails
> We shouldn't be giving restart privileges to our devs, and therefore our sysadm has to
frequently restart the workers.  When the sysadm is not around, there is nothing our devs
can do.
> Many thanks



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message