spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From pwendell <>
Subject [GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...
Date Thu, 13 Mar 2014 05:15:50 GMT
Github user pwendell commented on a diff in the pull request:
    --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ---
    @@ -181,15 +178,50 @@ private[spark] class MapOutputTracker(conf: SparkConf) extends Logging
    + * MapOutputTracker for the workers. This uses BoundedHashMap to keep track of
    + * a limited number of most recently used map output information.
    + */
    +private[spark] class MapOutputTrackerWorker(conf: SparkConf) extends MapOutputTracker(conf)
    +  /**
    +   * Bounded HashMap for storing serialized statuses in the worker. This allows
    +   * the HashMap stay bounded in memory-usage. Things dropped from this HashMap will
    +   * automatically repopulated by fetching them again from the driver. Its okay to
    +   * keep the cache size small as it unlikely that there will be a very large number
    +   * stages active simultaneously in the worker.
    +   */
    +  protected val mapStatuses = new BoundedHashMap[Int, Array[MapStatus]](
    --- End diff --
    I'm suggesting this - when a shuffle dependency goes out of scope in the driver we can
find all of the associated stages. Once the stages are located then we tell each Executor
to clean up everything corresponding to the stage. This would include the map status and any
other information. Right now this might have to go through a BlockManager message or something
like that. Basically I'm suggesting we try to do principled garbage collection based on things
going out of scope whenever possible.

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

View raw message