spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From JoshRosen <...@git.apache.org>
Subject [GitHub] spark pull request: [SPARK-12817] Add BlockManager.getOrElseUpdate...
Date Mon, 29 Feb 2016 22:42:42 GMT
GitHub user JoshRosen opened a pull request:

    https://github.com/apache/spark/pull/11436

    [SPARK-12817] Add BlockManager.getOrElseUpdate and remove CacheManager

    CacheManager directly calls MemoryStore.unrollSafely() and has its own logic for handling
graceful fallback to disk when cached data does not fit in memory. However, this logic also
exists inside of the MemoryStore itself, so this appears to be unnecessary duplication.
    
    Thanks to the addition of block-level read/write locks in #10705, we can refactor the
code to remove the CacheManager and replace it with an atomic `BlockManager.getOrElseUpdate()`
method.
    
    This pull request replaces / subsumes #10748.
    
    /cc @andrewor14 and @nongli for review. Note that this changes the locking semantics of
a couple of internal BlockManager methods (`doPut()` and `lockNewBlockForWriting`), so please
pay attention to the Scaladoc changes and new test cases for those methods.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JoshRosen/spark remove-cachemanager

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11436.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11436
    
----
commit 31e2ec371dd4966fba1e713a32a6adb7cc76141e
Author: Josh Rosen <joshrosen@databricks.com>
Date:   2016-02-29T19:17:35Z

    Change put() methods to release locks after they return.
    
    Previously these methods would downgrade the exclusive write lock to a shared
    read lock, but this behavior is only needed in one place (CacheManager) and I'm
    planning to replace that with a BlockManager getOrElseUpdate method, so it makes
    sense to make lock downgrading the exception rather than the common case.

commit d6ce63dbf3d4009af71df52bbcf8c183da4a5f29
Author: Josh Rosen <joshrosen@databricks.com>
Date:   2016-02-29T22:16:26Z

    Add getOrCompute() to replace CacheManager usage in RDD.

commit e5f505e4b6203bf27559e60efe044f4568720a19
Author: Josh Rosen <joshrosen@databricks.com>
Date:   2016-02-29T22:18:04Z

    Remove CacheManager

commit 2613038512ade9082ec5f3d58b4d471bdc01ca50
Author: Josh Rosen <joshrosen@databricks.com>
Date:   2016-02-29T22:24:46Z

    Remove BlockStore / BlockManager putArray() method (since it's now unused).

commit 0c48c632f81205d2d7447a70ba626297aa23e8c9
Author: Josh Rosen <joshrosen@databricks.com>
Date:   2016-02-29T22:27:15Z

    Inline MemoryStore.putArray() at its only callsite.

commit 8f6cc09a49904360ff10ef986150144d60182e06
Author: Josh Rosen <joshrosen@databricks.com>
Date:   2016-02-29T22:33:33Z

    Trim some excess whitespace.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message