spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wenchen Fan (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-26114) Memory leak of PartitionedPairBuffer when coalescing after repartitionAndSortWithinPartitions
Date Wed, 28 Nov 2018 12:24:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-26114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wenchen Fan resolved SPARK-26114.
---------------------------------
       Resolution: Fixed
    Fix Version/s: 2.4.1
                   3.0.0

Issue resolved by pull request 23083
[https://github.com/apache/spark/pull/23083]

> Memory leak of PartitionedPairBuffer when coalescing after repartitionAndSortWithinPartitions
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-26114
>                 URL: https://issues.apache.org/jira/browse/SPARK-26114
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.2.2, 2.3.2, 2.4.0
>         Environment: Spark 3.0.0-SNAPSHOT (master branch)
> Scala 2.11
> Yarn 2.7
>            Reporter: Sergey Zhemzhitsky
>            Assignee: Sergey Zhemzhitsky
>            Priority: Major
>             Fix For: 3.0.0, 2.4.1
>
>         Attachments: run1-noparams-dominator-tree-externalsorter-gc-root.png, run1-noparams-dominator-tree-externalsorter.png,
run1-noparams-dominator-tree.png
>
>
> Trying to use _coalesce_ after shuffle-oriented transformations leads to OutOfMemoryErrors
or _Container killed by YARN for exceeding memory limits. X GB of Y GB physical memory used.
Consider boostingspark.yarn.executor.memoryOverhead_.
> Discussion is [here|http://apache-spark-developers-list.1001551.n3.nabble.com/Coalesce-behaviour-td25289.html].
> The error happens when trying specify pretty small number of partitions in _coalesce_
call.
> *How to reproduce?*
> # Start spark-shell
> {code:bash}
> spark-shell \ 
>   --num-executors=5 \ 
>   --executor-cores=2 \ 
>   --master=yarn \
>   --deploy-mode=client \ 
>   --conf spark.executor.memoryOverhead=512 \
>   --conf spark.executor.memory=1g \ 
>   --conf spark.dynamicAllocation.enabled=false \
>   --conf spark.executor.extraJavaOptions='-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp
-Dio.netty.noUnsafe=true'
> {code}
> Please note using _-Dio.netty.noUnsafe=true_ property. Preventing off-heap memory usage
seems to be the only way to control the amount of memory used for shuffle data transferring
by now.
> Also note that the total number of cores allocated for job is 5x2=10
> # Then generate some test data
> {code:scala}
> import org.apache.hadoop.io._ 
> import org.apache.hadoop.io.compress._ 
> import org.apache.commons.lang._ 
> import org.apache.spark._ 
> // generate 100M records of sample data 
> sc.makeRDD(1 to 1000, 1000) 
>   .flatMap(item => (1 to 100000) 
>     .map(i => new Text(RandomStringUtils.randomAlphanumeric(3).toLowerCase) ->
new Text(RandomStringUtils.randomAlphanumeric(1024)))) 
>   .saveAsSequenceFile("/tmp/random-strings", Some(classOf[GzipCodec])) 
> {code}
> # Run the sample job
> {code:scala}
> import org.apache.hadoop.io._
> import org.apache.spark._
> import org.apache.spark.storage._
> val rdd = sc.sequenceFile("/tmp/random-strings", classOf[Text], classOf[Text])
> rdd 
>   .map(item => item._1.toString -> item._2.toString) 
>   .repartitionAndSortWithinPartitions(new HashPartitioner(1000)) 
>   .coalesce(10,false) 
>   .count 
> {code}
> Note that the number of partitions is equal to the total number of cores allocated to
the job.
> Here is dominator tree from the heapdump
>  !run1-noparams-dominator-tree.png|width=700!
> 4 instances of ExternalSorter, although there are only 2 concurrently running tasks per
executor.
>  !run1-noparams-dominator-tree-externalsorter.png|width=700! 
> And paths to GC root of the already stopped ExternalSorter.
>  !run1-noparams-dominator-tree-externalsorter-gc-root.png|width=700! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message