spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matei Zaharia (JIRA)" <>
Subject [jira] [Commented] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory
Date Mon, 17 Nov 2014 23:44:33 GMT


Matei Zaharia commented on SPARK-4452:

How much of this gets fixed if you fix the elementsRead bug in ExternalSorter?

With forcing data structures to spill, the problem is that it will introduce complexity in
every spillable data structure. I wonder if we can make it just give out memory in smaller
increments, so that threads check whether they should spill more often. In addition, we can
set a better minimum or maximum on each thread (e.g. always let it ramp up to, say, 5 MB,
or some fraction of the memory space).

I do like the idea of making the ShuffleMemoryManager track limits per object. I actually
considered this when I wrote that and didn't do it, possibly because it would've created more
complexity in figuring out when an object is done. But it seems like it should be straightforward
to add in, as long as you also track which objects come from which thread so that you can
still releaseMemoryForThisThread() to clean up.

> Shuffle data structures can starve others on the same thread for memory 
> ------------------------------------------------------------------------
>                 Key: SPARK-4452
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>            Reporter: Tianshuo Deng
> When an Aggregator is used with ExternalSorter in a task, spark will create many small
files and could cause too many files open error during merging.
> This happens when using the sort-based shuffle. The issue is caused by multiple factors:
> 1. There seems to be a bug in setting the elementsRead variable in ExternalSorter, which
renders the trackMemoryThreshold(defined in Spillable) useless for triggering spilling, the
pr to fix it is
> 2. Current ShuffleMemoryManager does not work well when there are 2 spillable objects
in a thread, which are ExternalSorter and ExternalAppendOnlyMap(used by Aggregator) in this
case. Here is an example: Due to the usage of mapside aggregation, ExternalAppendOnlyMap is
created first to read the RDD. It may ask as much memory as it can, which is totalMem/numberOfThreads.
Then later on when ExternalSorter is created in the same thread, the ShuffleMemoryManager
could refuse to allocate more memory to it, since the memory is already given to the previous
requested object(ExternalAppendOnlyMap). That causes the ExternalSorter keeps spilling small
files(due to the lack of memory)
> I'm currently working on a PR to address these two issues. It will include following
> 1. The ShuffleMemoryManager should not only track the memory usage for each thread, but
also the object who holds the memory
> 2. The ShuffleMemoryManager should be able to trigger the spilling of a spillable object.
In this way, if a new object in a thread is requesting memory, the old occupant could be evicted/spilled.
This avoids problem 2 from happening. Previously spillable object triggers spilling by themself.
So one may not trigger spilling even if another object in the same thread needs more memory.
After this change The ShuffleMemoryManager could trigger the spilling of an object if it needs
> 3. Make the iterator of ExternalAppendOnlyMap spillable. Previously ExternalAppendOnlyMap
returns an destructive iterator and can not be spilled after the iterator is returned. This
should be changed so that even after the iterator is returned, the ShuffleMemoryManager can
still spill it.
> Currently, I have a working branch in progress:

> Already made change 3 and have a prototype of change 1 and 2 to evict spillable from
memory manager, still in progress.
> I will send a PR when it's done.
> Any feedback or thoughts on this change is highly appreciated !

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message