spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabor Feher (JIRA)" <>
Subject [jira] [Updated] (SPARK-15796) Reduce spark.memory.fraction default to avoid overrunning old gen in JVM default config
Date Sun, 12 Jun 2016 18:27:21 GMT


Gabor Feher updated SPARK-15796:
    Attachment: baseline.txt

I've run the above code in four different setup:

memfrac06.txt; spark.memory.fraction=0.6
total time= 45s, GC time= 2s
memfrac063.txt; spark.memory.fraction=0.63
total time= 56s, GC time= 11s
memfrac066.txt; spark.memory.fraction=0.66
total time= 270s, GC time= 225s
baseline.txt; spark.memory.fraction=0.75 (default)
total time= 270s, GC time= 233s

(Note that numbers are from single runs, so there is some noise.)
Also see that, spark.memory.fraction=0.66 is not low enough to fix the problem.

Here is an example line from the output of the baseline case (spark.memory.fraction=0.75):
[Full GC [PSYoungGen: 350208K->0K(699392K)] [ParOldGen: 2028768K->2028681K(2097152K)]
2378976K->2028681K(2796544K) [PSPermGen: 44023K->44023K(91136K)], 1.9907560 secs] [Times:
user=6.62 sys=0.02, real=1.99 secs] 

See that the old generation is nearly full. I think if it's over a threshold close to full,
then full GC gets triggered and that causes the slowdown. This would explain why using exactly
0.66 as a threshold is not fixing the issue, but lower threshold do.

p.s.: I've seen [~andrewor14]'s presentation at the Spark Summit:
I think he has mentioned that one difference between storage and execution memory is that
execution is short-lived while storage is long-lived. So one idea can be to introduce a new
config value, e.g. spark.memory.cacheUpperLimit which is always lower than the old generation
ratio. But at the same time, keep spark.memory.fraction at a higher level to allow for more
execution memory. I am not sure if this would work well, but it might be one thing to try
and measure its performance.

> Reduce spark.memory.fraction default to avoid overrunning old gen in JVM default config
> ---------------------------------------------------------------------------------------
>                 Key: SPARK-15796
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>    Affects Versions: 1.6.0, 1.6.1
>            Reporter: Gabor Feher
>            Priority: Minor
>         Attachments: baseline.txt, memfrac06.txt, memfrac063.txt, memfrac066.txt
> While debugging performance issues in a Spark program, I've found a simple way to slow
down Spark 1.6 significantly by filling the RDD memory cache. This seems to be a regression,
because setting "spark.memory.useLegacyMode=true" fixes the problem. Here is a repro that
is just a simple program that fills the memory cache of Spark using a MEMORY_ONLY cached RDD
(but of course this comes up in more complex situations, too):
> {code}
> import org.apache.spark.SparkContext
> import org.apache.spark.SparkConf
> import
> object CacheDemoApp { 
>   def main(args: Array[String]) {
>     val conf = new SparkConf().setAppName("Cache Demo Application")                 
>     val sc = new SparkContext(conf)
>     val startTime = System.currentTimeMillis()
>     val cacheFiller = sc.parallelize(1 to 500000000, 1000)                          
>       .mapPartitionsWithIndex {
>         case (ix, it) =>
>           println(s"CREATE DATA PARTITION ${ix}")                                   
>           val r = new scala.util.Random(ix)
>  => (r.nextLong, r.nextLong))
>       }
>     cacheFiller.persist(StorageLevel.MEMORY_ONLY)
>     cacheFiller.foreach(identity)
>     val finishTime = System.currentTimeMillis()
>     val elapsedTime = (finishTime - startTime) / 1000
>     println(s"TIME= $elapsedTime s")
>   }
> }
> {code}
> If I call it the following way, it completes in around 5 minutes on my Laptop, while
often stopping for slow Full GC cycles. I can also see with jvisualvm (Visual GC plugin) that
the old generation of JVM is 96.8% filled.
> {code}
> sbt package
> ~/spark-1.6.0/bin/spark-submit \
>   --class "CacheDemoApp" \
>   --master "local[2]" \
>   --driver-memory 3g \
>   --driver-java-options "-XX:+PrintGCDetails" \
>   target/scala-2.10/simple-project_2.10-1.0.jar
> {code}
> If I add any one of the below flags, then the run-time drops to around 40-50 seconds
and the difference is coming from the drop in GC times:
>   --conf "spark.memory.fraction=0.6"
> OR
>   --conf "spark.memory.useLegacyMode=true"
> OR
>   --driver-java-options "-XX:NewRatio=3"
> All the other cache types except for DISK_ONLY produce similar symptoms. It looks like
that the problem is that the amount of data Spark wants to store long-term ends up being larger
than the old generation size in the JVM and this triggers Full GC repeatedly.
> I did some research:
> * In Spark 1.6, spark.memory.fraction is the upper limit on cache size. It defaults to
> * In Spark 1.5, is the upper limit in cache size. It defaults
to 0.6 and...
> * even says that it shouldn't be
bigger than the size of the old generation.
> * On the other hand, OpenJDK's default NewRatio is 2, which means an old generation size
of 66%. Hence the default value in Spark 1.6 contradicts this advice.
> recommends that if the old generation
is running close to full, then setting spark.memory.storageFraction to a lower value should
help. I have tried with spark.memory.storageFraction=0.1, but it still doesn't fix the issue.
This is not a surprise: explains that
storageFraction is not an upper-limit but a lower limit-like thing on the size of Spark's
cache. The real upper limit is spark.memory.fraction.
> To sum up my questions/issues:
> * At least should be fixed. Maybe the
old generation size should also be mentioned in configuration.html near spark.memory.fraction.
> * Is it a goal for Spark to support heavy caching with default parameters and without
GC breakdown? If so, then better default values are needed.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message