spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matei Zaharia (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory
Date Tue, 18 Nov 2014 19:57:34 GMT

    [ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216691#comment-14216691
] 

Matei Zaharia commented on SPARK-4452:
--------------------------------------

BTW I've thought about this more and here's what I'd suggest: try a version where each object
is allowed to ramp up to a certain size (say 5 MB) before being subject to the limit, and
if that doesn't work, then maybe go for the forced-spilling one. The reason is that as soon
as N objects are active, the ShuffleMemoryManager will not let any object ramp up to more
than 1/N, so it just has to fill up its current quota and stop. This means that scenarios
with very little free memory might only happen at the beginning (when tasks start up). If
we can make this work, then we avoid a lot of concurrency problems that would happen with
forced spilling. 

Another improvement would be to make the Spillables request less than 2x their current memory
when they ramp up, e.g. 1.5x. They'd then make more requests but it would lead to slower ramp-up
and more of a chance for other threads to grab memory. But I think this will have less impact
than simply increasing that free minimum amount.

> Shuffle data structures can starve others on the same thread for memory 
> ------------------------------------------------------------------------
>
>                 Key: SPARK-4452
>                 URL: https://issues.apache.org/jira/browse/SPARK-4452
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>            Reporter: Tianshuo Deng
>            Assignee: Tianshuo Deng
>            Priority: Blocker
>
> When an Aggregator is used with ExternalSorter in a task, spark will create many small
files and could cause too many files open error during merging.
> Currently, ShuffleMemoryManager does not work well when there are 2 spillable objects
in a thread, which are ExternalSorter and ExternalAppendOnlyMap(used by Aggregator) in this
case. Here is an example: Due to the usage of mapside aggregation, ExternalAppendOnlyMap is
created first to read the RDD. It may ask as much memory as it can, which is totalMem/numberOfThreads.
Then later on when ExternalSorter is created in the same thread, the ShuffleMemoryManager
could refuse to allocate more memory to it, since the memory is already given to the previous
requested object(ExternalAppendOnlyMap). That causes the ExternalSorter keeps spilling small
files(due to the lack of memory)
> I'm currently working on a PR to address these two issues. It will include following
changes:
> 1. The ShuffleMemoryManager should not only track the memory usage for each thread, but
also the object who holds the memory
> 2. The ShuffleMemoryManager should be able to trigger the spilling of a spillable object.
In this way, if a new object in a thread is requesting memory, the old occupant could be evicted/spilled.
Previously the spillable objects trigger spilling by themselves. So one may not trigger spilling
even if another object in the same thread needs more memory. After this change The ShuffleMemoryManager
could trigger the spilling of an object if it needs to.
> 3. Make the iterator of ExternalAppendOnlyMap spillable. Previously ExternalAppendOnlyMap
returns an destructive iterator and can not be spilled after the iterator is returned. This
should be changed so that even after the iterator is returned, the ShuffleMemoryManager can
still spill it.
> Currently, I have a working branch in progress: https://github.com/tsdeng/spark/tree/enhance_memory_manager.
Already made change 3 and have a prototype of change 1 and 2 to evict spillable from memory
manager, still in progress. I will send a PR when it's done.
> Any feedback or thoughts on this change is highly appreciated !



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message