spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sridhar Rana (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-19920) How to capture reasons (log/trace/info/anything?) for "[ERROR]Driver 172.31.25.77:45151 disassociated! Shutting down."
Date Sat, 11 Mar 2017 15:57:04 GMT
Sridhar Rana created SPARK-19920:
------------------------------------

             Summary: How to capture reasons (log/trace/info/anything?) for "[ERROR]Driver
172.31.25.77:45151 disassociated! Shutting down."
                 Key: SPARK-19920
                 URL: https://issues.apache.org/jira/browse/SPARK-19920
             Project: Spark
          Issue Type: Question
          Components: Spark Submit, YARN
    Affects Versions: 2.1.0
            Reporter: Sridhar Rana
            Priority: Critical


We have an AWS Cloudera Spark environment. It is a yarn cluster with 1 driver node and 3 executor
node. We use Spark SQL heavily and log4J for logging. Ours is a 24X7 long running process
in a iterative loop. The process runs fine but after several iterations (after few hours),
it reports this error "[ERROR]Driver 172.31.25.77:45151 disassociated! Shutting down.". At
the sames second, there is this warning "[WARN ]Error sending message [message = Heartbeat(1,[Lscala.Tuple2;@24452d3d,BlockManagerId(1,
ip-172-31-21-121.ec2.internal, 40378))] in 1 attempts". The spark process is able to recover
from this failure but takes more time to finish that iteration. Other than that there is not
much info on this. How do we know what is the cause of this error condition or what's causing
it so that appropriate measure can be taken? Can we capture those using log4j?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message