spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From unk1102 <>
Subject Why dataframe.persist(StorageLevels.MEMORY_AND_DISK_SER) hangs for long time?
Date Thu, 08 Oct 2015 18:27:40 GMT
Hi as recommended I am caching my Spark job dataframe as
dataframe.persist(StorageLevels.MEMORY_AND_DISK_SER) but what I see in Spark
job UI is this persist stage runs for so long showing 10 GB of shuffle read
and 5 GB of shuffle write it takes to long to finish and because of that
sometimes my Spark job throws timeout or throws OOM and hence executors gets
killed by YARN. I am using Spark 1.4.1. I am using all sort of optimizations
like Tungsten, Kryo I have given storage.memoryFraction as 0.2 and
storage.shuffle as 0.2 also. My data is huge around 1 TB I am using default
200 partitions for spark.sql.shuffle.partitions. Please help me I am
clueless please guide.

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message