www-infrastructure-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kellrott <...@git.apache.org>
Subject [GitHub] incubator-spark pull request: Patch for SPARK-942
Date Tue, 25 Feb 2014 06:48:28 GMT
Github user kellrott commented on a diff in the pull request:

    https://github.com/apache/incubator-spark/pull/180#discussion_r10025975
  
    --- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
    @@ -71,10 +71,21 @@ private[spark] class CacheManager(blockManager: BlockManager) extends
Logging {
               val computedValues = rdd.computeOrReadCheckpoint(split, context)
               // Persist the result, so long as the task is not running locally
               if (context.runningLocally) { return computedValues }
    -          val elements = new ArrayBuffer[Any]
    -          elements ++= computedValues
    -          blockManager.put(key, elements, storageLevel, tellMaster = true)
    -          elements.iterator.asInstanceOf[Iterator[T]]
    +          if (storageLevel.useDisk && !storageLevel.useMemory) {
    +            blockManager.put(key, computedValues, storageLevel, tellMaster = true)
    +            return blockManager.get(key) match {
    +              case Some(values) =>
    +                return new InterruptibleIterator(context, values.asInstanceOf[Iterator[T]])
    +              case None =>
    +                logInfo("Failure to store %s".format(key))
    +                return null
    +            }
    +          } else {
    +            val elements = new ArrayBuffer[Any]
    +            elements ++= computedValues
    +            blockManager.put(key, elements, storageLevel, tellMaster = true)
    --- End diff --
    
    I think the logic here is that because the function has to return an iterator, you want
to copy it before passing it to the block manager, the elements ++= computeValues is just
a caching step. You don't know if it is a one time iterator and blockManager.put could consume
the iterator and then only return the number of bytes written.
    It's not an issue with the to-disk copy, because you let the 'put' consume the iterator
and then use a 'get' to grab a new iterator that reads off the disk. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message