spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dr. Christian Betz (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-5081) Shuffle write increases
Date Fri, 13 Feb 2015 11:01:12 GMT

    [ https://issues.apache.org/jira/browse/SPARK-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14319948#comment-14319948
] 

Dr. Christian Betz commented on SPARK-5081:
-------------------------------------------

>From  SPARK-5715 
I see a *factor four performance loss* in my Spark jobs when migrating from Spark 1.1.0 to
Spark 1.2.0 or 1.2.1.

Also, I see an *increase in the size of shuffle writes* (which is also reported by Kevin Jung
on the mailing list: http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-write-increases-in-spark-1-2-tt20894.html


Together with this I experience a *huge number of disk spills*.



I'm experiencing these with my job under the following circumstances: 

* Spark 1.2.0 with Sort-based Shuffle 
* Spark 1.2.0 with Hash-based Shuffle 
* Spark 1.2.1 with Sort-based Shuffle 

All three combinations show the same behavior, which contrasts from Spark 1.1.0. 

In Spark 1.1.0, my job runs for about an hour, in Spark 1.2.x it runs for almost four hours.
Configuration is identical otherwise - I only added org.apache.spark.scheduler.CompressedMapStatus
to the Kryo registrator for Spark 1.2.0 to cope with https://issues.apache.org/jira/browse/SPARK-5102.



As a consequence (I think, but causality might be different) I see lots and lots of disk spills.


I cannot provide a small test case, but maybe the log entries for a single worker thread can
help someone investigate on this. (See below.) 


I will also open up an issue, if nobody stops me by providing an answer ;) 

Any help will be greatly appreciated, because otherwise I'm stuck with Spark 1.1.0, as quadrupling
runtime is not an option. 

Sincerely, 

Chris 



2015-02-09T14:06:06.328+01:00 INFO org.apache.spark.executor.Executor Running task 9.0 in
stage 18.0 (TID 300) Executor task launch worker-18 
2015-02-09T14:06:06.351+01:00 INFO org.apache.spark.CacheManager Partition rdd_35_9 not found,
computing it Executor task launch worker-18 
2015-02-09T14:06:06.351+01:00 INFO org.apache.spark.storage.ShuffleBlockFetcherIterator Getting
10 non-empty blocks out of 10 blocks Executor task launch worker-18 
2015-02-09T14:06:06.351+01:00 INFO org.apache.spark.storage.ShuffleBlockFetcherIterator Started
0 remote fetches in 0 ms Executor task launch worker-18 
2015-02-09T14:06:07.396+01:00 INFO org.apache.spark.storage.MemoryStore ensureFreeSpace(2582904)
called with curMem=300174944, maxMe... Executor task launch worker-18 
2015-02-09T14:06:07.397+01:00 INFO org.apache.spark.storage.MemoryStore Block rdd_35_9 stored
as bytes in memory (estimated size 2.5... Executor task launch worker-18 
2015-02-09T14:06:07.398+01:00 INFO org.apache.spark.storage.BlockManagerMaster Updated info
of block rdd_35_9 Executor task launch worker-18 
2015-02-09T14:06:07.399+01:00 INFO org.apache.spark.CacheManager Partition rdd_38_9 not found,
computing it Executor task launch worker-18 
2015-02-09T14:06:07.399+01:00 INFO org.apache.spark.storage.ShuffleBlockFetcherIterator Getting
10 non-empty blocks out of 10 blocks Executor task launch worker-18 
2015-02-09T14:06:07.400+01:00 INFO org.apache.spark.storage.ShuffleBlockFetcherIterator Started
0 remote fetches in 0 ms Executor task launch worker-18 
2015-02-09T14:06:07.567+01:00 INFO org.apache.spark.storage.MemoryStore ensureFreeSpace(944848)
called with curMem=302757848, maxMem... Executor task launch worker-18 
2015-02-09T14:06:07.568+01:00 INFO org.apache.spark.storage.MemoryStore Block rdd_38_9 stored
as values in memory (estimated size 92... Executor task launch worker-18 
2015-02-09T14:06:07.569+01:00 INFO org.apache.spark.storage.BlockManagerMaster Updated info
of block rdd_38_9 Executor task launch worker-18 
2015-02-09T14:06:07.573+01:00 INFO org.apache.spark.storage.ShuffleBlockFetcherIterator Getting
34 non-empty blocks out of 50 blocks Executor task launch worker-18 
2015-02-09T14:06:07.573+01:00 INFO org.apache.spark.storage.ShuffleBlockFetcherIterator Started
0 remote fetches in 1 ms Executor task launch worker-18 
2015-02-09T14:06:38.931+01:00 INFO org.apache.spark.CacheManager Partition rdd_41_9 not found,
computing it Executor task launch worker-18 
2015-02-09T14:06:38.931+01:00 INFO org.apache.spark.storage.ShuffleBlockFetcherIterator Getting
3 non-empty blocks out of 10 blocks Executor task launch worker-18 
2015-02-09T14:06:38.931+01:00 INFO org.apache.spark.storage.ShuffleBlockFetcherIterator Started
0 remote fetches in 0 ms Executor task launch worker-18 
2015-02-09T14:06:38.945+01:00 INFO org.apache.spark.storage.MemoryStore ensureFreeSpace(0)
called with curMem=307529127, maxMem=9261... Executor task launch worker-18 
2015-02-09T14:06:38.945+01:00 INFO org.apache.spark.storage.MemoryStore Block rdd_41_9 stored
as bytes in memory (estimated size 0.0... Executor task launch worker-18 
2015-02-09T14:06:38.946+01:00 INFO org.apache.spark.storage.BlockManagerMaster Updated info
of block rdd_41_9 Executor task launch worker-18 
2015-02-09T14:06:38.946+01:00 WARN org.apache.spark.storage.BlockManager Block rdd_41_9 replicated
to only 0 peer(s) instead of 1 pee... Executor task launch worker-18 
2015-02-09T14:06:39.088+01:00 INFO org.apache.spark.storage.BlockManager Found block rdd_3_9
locally Executor task launch worker-18 
2015-02-09T14:06:41.389+01:00 INFO org.apache.spark.CacheManager Partition rdd_7_9 not found,
computing it Executor task launch worker-18 
2015-02-09T14:06:41.389+01:00 INFO org.apache.spark.storage.ShuffleBlockFetcherIterator Getting
1 non-empty blocks out of 1 blocks Executor task launch worker-18 
2015-02-09T14:06:41.389+01:00 INFO org.apache.spark.storage.ShuffleBlockFetcherIterator Started
0 remote fetches in 0 ms Executor task launch worker-18 
2015-02-09T14:06:41.402+01:00 INFO org.apache.spark.storage.MemoryStore ensureFreeSpace(38144)
called with curMem=307529151, maxMem=... Executor task launch worker-18 
2015-02-09T14:06:41.402+01:00 INFO org.apache.spark.storage.MemoryStore Block rdd_7_9 stored
as values in memory (estimated size 37.... Executor task launch worker-18 
2015-02-09T14:06:41.404+01:00 INFO org.apache.spark.storage.BlockManagerMaster Updated info
of block rdd_7_9 Executor task launch worker-18 
2015-02-09T14:07:00.019+01:00 INFO org.apache.spark.CacheManager Partition rdd_73_9 not found,
computing it Executor task launch worker-18 
2015-02-09T14:07:00.019+01:00 INFO org.apache.spark.storage.ShuffleBlockFetcherIterator Getting
1 non-empty blocks out of 1 blocks Executor task launch worker-18 
2015-02-09T14:07:00.019+01:00 INFO org.apache.spark.storage.ShuffleBlockFetcherIterator Started
0 remote fetches in 0 ms Executor task launch worker-18 
2015-02-09T14:07:00.185+01:00 INFO org.apache.spark.storage.MemoryStore ensureFreeSpace(826250)
called with curMem=307567295, maxMem... Executor task launch worker-18 
2015-02-09T14:07:00.185+01:00 INFO org.apache.spark.storage.MemoryStore Block rdd_73_9 stored
as values in memory (estimated size 80... Executor task launch worker-18 
2015-02-09T14:07:00.186+01:00 INFO org.apache.spark.storage.BlockManagerMaster Updated info
of block rdd_73_9 Executor task launch worker-18 
2015-02-09T14:07:00.190+01:00 INFO org.apache.spark.storage.BlockManager Found block rdd_35_9
locally Executor task launch worker-18 
2015-02-09T14:07:00.190+01:00 INFO org.apache.spark.storage.BlockManager Found block rdd_38_9
locally Executor task launch worker-18 
2015-02-09T14:07:00.194+01:00 INFO org.apache.spark.storage.ShuffleBlockFetcherIterator Getting
34 non-empty blocks out of 50 blocks Executor task launch worker-18 
2015-02-09T14:07:00.194+01:00 INFO org.apache.spark.storage.ShuffleBlockFetcherIterator Started
0 remote fetches in 1 ms Executor task launch worker-18 
2015-02-09T14:07:17.967+01:00 INFO org.apache.spark.util.collection.ExternalAppendOnlyMap
Thread 117 spilling in-memory map of 670.2 MB to disk (1 tim... Executor task launch worker-18

2015-02-09T14:07:46.716+01:00 INFO org.apache.spark.storage.BlockManager Found block rdd_41_9
locally Executor task launch worker-18 
2015-02-09T14:07:47.603+01:00 INFO org.apache.spark.storage.BlockManager Found block rdd_3_9
locally Executor task launch worker-18 
2015-02-09T14:07:47.897+01:00 INFO org.apache.spark.util.collection.ExternalAppendOnlyMap
Thread 117 spilling in-memory map of 5.0 MB to disk (1 time ... Executor task launch worker-18

2015-02-09T14:07:48.270+01:00 INFO org.apache.spark.util.collection.ExternalAppendOnlyMap
Thread 117 spilling in-memory map of 5.0 MB to disk (2 times... Executor task launch worker-18

2015-02-09T14:07:48.727+01:00 INFO org.apache.spark.util.collection.ExternalAppendOnlyMap
Thread 117 spilling in-memory map of 5.0 MB to disk (3 times... Executor task launch worker-18

2015-02-09T14:07:49.021+01:00 INFO org.apache.spark.util.collection.ExternalAppendOnlyMap
Thread 117 spilling in-memory map of 5.0 MB to disk (4 times... Executor task launch worker-18



> Shuffle write increases
> -----------------------
>
>                 Key: SPARK-5081
>                 URL: https://issues.apache.org/jira/browse/SPARK-5081
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle
>    Affects Versions: 1.2.0
>            Reporter: Kevin Jung
>
> The size of shuffle write showing in spark web UI is much different when I execute same
spark job with same input data in both spark 1.1 and spark 1.2. 
> At sortBy stage, the size of shuffle write is 98.1MB in spark 1.1 but 146.9MB in spark
1.2. 
> I set spark.shuffle.manager option to hash because it's default value is changed but
spark 1.2 still writes shuffle output more than spark 1.1.
> It can increase disk I/O overhead exponentially as the input file gets bigger and it
causes the jobs take more time to complete. 
> In the case of about 100GB input, for example, the size of shuffle write is 39.7GB in
spark 1.1 but 91.0GB in spark 1.2.
> spark 1.1
> ||Stage Id||Description||Input||Shuffle Read||Shuffle Write||
> |9|saveAsTextFile| |1169.4KB| |
> |12|combineByKey| |1265.4KB|1275.0KB|
> |6|sortByKey| |1276.5KB| |
> |8|mapPartitions| |91.0MB|1383.1KB|
> |4|apply| |89.4MB| |
> |5|sortBy|155.6MB| |98.1MB|
> |3|sortBy|155.6MB| | |
> |1|collect| |2.1MB| |
> |2|mapValues|155.6MB| |2.2MB|
> |0|first|184.4KB| | |
> spark 1.2
> ||Stage Id||Description||Input||Shuffle Read||Shuffle Write||
> |12|saveAsTextFile| |1170.2KB| |
> |11|combineByKey| |1264.5KB|1275.0KB|
> |8|sortByKey| |1273.6KB| |
> |7|mapPartitions| |134.5MB|1383.1KB|
> |5|zipWithIndex| |132.5MB| |
> |4|sortBy|155.6MB| |146.9MB|
> |3|sortBy|155.6MB| | |
> |2|collect| |2.0MB| |
> |1|mapValues|155.6MB| |2.2MB|
> |0|first|184.4KB| | |



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message