spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Patrick Liu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-8790) BlockManager.reregister cause OOM
Date Thu, 02 Jul 2015 11:51:05 GMT
Patrick Liu created SPARK-8790:
----------------------------------

             Summary: BlockManager.reregister cause OOM
                 Key: SPARK-8790
                 URL: https://issues.apache.org/jira/browse/SPARK-8790
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
            Reporter: Patrick Liu


We run SparkSQL 1.2.1 on Yarn.

A SQL consists of 100 tasks, most them finish in < 10s, but only 1 lasts for 16m.

The webUI shows that the executor has running GC for 15m until OOM.

The log shows that the executor first try to connect to master to report broadcast value,
however the network is not available, so the executor connot contact master. Then the executor
lost connection with Master. 
Then the master require the executor to reregister. When executor are reporAllBlocks to master,
the network is still not so stable, so sometimes time-out.

Finally, the executor OOM.

Please take a look.

Attached is the detailed log.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message