spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sampo Niskanen <sampo.niska...@wellmo.com>
Subject Caching causes later actions to get stuck
Date Fri, 30 Oct 2015 14:57:03 GMT
Hi,

I'm facing a problem where Spark is able to perform an action on a cached
RDD correctly the first time it is run, but running it immediately
afterwards (or an action depending on that RDD) causes it to get stuck.

I'm using a MongoDB connector for fetching all documents from a collection
to an RDD and caching that (though according to the error message it
doesn't fully fit).  The first action on it always succeeds, but latter
actions fail.  I just upgraded from Spark 0.9.x to 1.5.1, and didn't have
that problem with the older version.


The output I get:


scala> analyticsRDD.cache
res10: analyticsRDD.type = MapPartitionsRDD[84] at map at Mongo.scala:69

scala> analyticsRDD.count
[Stage 2:=================================================>     (472 + 8) /
524]15/10/30 14:20:00 WARN MemoryStore: Not enough space to cache
rdd_84_469 in memory! (computed 13.0 MB so far)
15/10/30 14:20:00 WARN MemoryStore: Not enough space to cache rdd_84_470 in
memory! (computed 12.1 MB so far)
15/10/30 14:20:00 WARN MemoryStore: Not enough space to cache rdd_84_476 in
memory! (computed 5.6 MB so far)
...
15/10/30 14:20:06 WARN MemoryStore: Not enough space to cache rdd_84_522 in
memory! (computed 5.3 MB so far)
[Stage 2:======================================================>(522 + 2) /
524]15/10/30 14:20:06 WARN MemoryStore: Not enough space to cache
rdd_84_521 in memory! (computed 13.9 MB so far)
res11: Long = 7754957


scala> analyticsRDD.count
[Stage 3:=================================================>     (474 + 0) /
524]


*** Restart Spark ***

scala> analyticsRDD.count
res10: Long = 7755043


scala> analyticsRDD.count
res11: Long = 7755050



The cached RDD always gets stuck at the same point.  I tried enabling full
debug logging, but couldn't make out anything useful.


I'm also facing another issue with loading a lot of data from MongoDB,
which might be related, but the error is different:
https://groups.google.com/forum/#!topic/mongodb-user/Knj406szd74


Any ideas?


*    Sampo Niskanen*

*Lead developer / Wellmo*
    sampo.niskanen@wellmo.com
    +358 40 820 5291

Mime
View raw message