spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nishkam Ravi (JIRA)" <>
Subject [jira] [Commented] (SPARK-11278) PageRank fails with unified memory manager
Date Sat, 24 Oct 2015 00:05:27 GMT


Nishkam Ravi commented on SPARK-11278:

Yeah, the problem goes away with useLegacyMode = true. 

In the executor logs, I see large spills:
15/10/23 14:27:13 INFO collection.ExternalSorter: Thread 145 spilling in-memory map of 1477.0
MB to disk (1 time so far)

and OOM errors:
15/10/23 14:47:44 ERROR executor.Executor: Exception in task 99.0 in stage 0.0 (TID 94)
java.lang.OutOfMemoryError: Java heap space
	at org.apache.spark.util.collection.AppendOnlyMap.growTable(AppendOnlyMap.scala:218)
	at org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.growTable(SizeTrackingAppendOnlyMap.scala:38)
	at org.apache.spark.util.collection.AppendOnlyMap.incrementSize(AppendOnlyMap.scala:204)
	at org.apache.spark.util.collection.AppendOnlyMap.changeValue(AppendOnlyMap.scala:151)
	at org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32)
	at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:210)
	at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	at org.apache.spark.executor.Executor$
	at java.util.concurrent.ThreadPoolExecutor.runWorker(
	at java.util.concurrent.ThreadPoolExecutor$

Different values of spark.memory.fraction and spark.memory.storageFraction didn't help either.

With smaller executors the workload goes through but with a 1.6x performance degradation (as
compared to without this commit). The spills are much smaller: 
15/10/23 15:43:26 INFO collection.ExternalSorter: Thread 117 spilling in-memory map of 5.0
MB to disk (1 time so far)

> PageRank fails with unified memory manager
> ------------------------------------------
>                 Key: SPARK-11278
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: GraphX, Spark Core
>    Affects Versions: 1.5.2, 1.6.0
>            Reporter: Nishkam Ravi
> PageRank (6-nodes, 32GB input) runs very slow and eventually fails with ExecutorLostFailure.
Traced it back to the 'unified memory manager' commit from Oct 13th. Took a quick look at
the code and couldn't see the problem (changes look pretty good). cc'ing [~andrewor14][~vanzin]
who may be able to spot the problem quickly. Can be reproduced by running PageRank on a large
enough input dataset if needed. Sorry for not being of much help here.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message