spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nishkam Ravi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-11278) PageRank fails with unified memory manager
Date Sat, 24 Oct 2015 00:05:27 GMT

    [ https://issues.apache.org/jira/browse/SPARK-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972155#comment-14972155
] 

Nishkam Ravi commented on SPARK-11278:
--------------------------------------

Yeah, the problem goes away with useLegacyMode = true. 

In the executor logs, I see large spills:
15/10/23 14:27:13 INFO collection.ExternalSorter: Thread 145 spilling in-memory map of 1477.0
MB to disk (1 time so far)

and OOM errors:
15/10/23 14:47:44 ERROR executor.Executor: Exception in task 99.0 in stage 0.0 (TID 94)
java.lang.OutOfMemoryError: Java heap space
	at org.apache.spark.util.collection.AppendOnlyMap.growTable(AppendOnlyMap.scala:218)
	at org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.growTable(SizeTrackingAppendOnlyMap.scala:38)
	at org.apache.spark.util.collection.AppendOnlyMap.incrementSize(AppendOnlyMap.scala:204)
	at org.apache.spark.util.collection.AppendOnlyMap.changeValue(AppendOnlyMap.scala:151)
	at org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32)
	at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:210)
	at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	at org.apache.spark.scheduler.Task.run(Task.scala:88)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

Different values of spark.memory.fraction and spark.memory.storageFraction didn't help either.

With smaller executors the workload goes through but with a 1.6x performance degradation (as
compared to without this commit). The spills are much smaller: 
15/10/23 15:43:26 INFO collection.ExternalSorter: Thread 117 spilling in-memory map of 5.0
MB to disk (1 time so far)



> PageRank fails with unified memory manager
> ------------------------------------------
>
>                 Key: SPARK-11278
>                 URL: https://issues.apache.org/jira/browse/SPARK-11278
>             Project: Spark
>          Issue Type: Bug
>          Components: GraphX, Spark Core
>    Affects Versions: 1.5.2, 1.6.0
>            Reporter: Nishkam Ravi
>
> PageRank (6-nodes, 32GB input) runs very slow and eventually fails with ExecutorLostFailure.
Traced it back to the 'unified memory manager' commit from Oct 13th. Took a quick look at
the code and couldn't see the problem (changes look pretty good). cc'ing [~andrewor14][~vanzin]
who may be able to spot the problem quickly. Can be reproduced by running PageRank on a large
enough input dataset if needed. Sorry for not being of much help here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message