spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Pentreath (JIRA)" <>
Subject [jira] [Commented] (SPARK-16921) RDD/DataFrame persist() and cache() should return Python context managers
Date Wed, 10 Aug 2016 07:19:20 GMT


Nick Pentreath commented on SPARK-16921:

By the way for BC vars, I wonder if {{__exit__}} should call {{unpersist}} or {{destroy}}?
Probably {{destroy}} as it's more along the lines of {{close}} type of semantics

> RDD/DataFrame persist() and cache() should return Python context managers
> -------------------------------------------------------------------------
>                 Key: SPARK-16921
>                 URL:
>             Project: Spark
>          Issue Type: New Feature
>          Components: PySpark, Spark Core, SQL
>            Reporter: Nicholas Chammas
>            Priority: Minor
> [Context managers|]
are a natural way to capture closely related setup and teardown code in Python.
> For example, they are commonly used when doing file I/O:
> {code}
> with open('/path/to/file') as f:
>     contents =
>     ...
> {code}
> Once the program exits the with block, {{f}} is automatically closed.
> I think it makes sense to apply this pattern to persisting and unpersisting DataFrames
and RDDs. There are many cases when you want to persist a DataFrame for a specific set of
operations and then unpersist it immediately afterwards.
> For example, take model training. Today, you might do something like this:
> {code}
> labeled_data.persist()
> model =
> labeled_data.unpersist()
> {code}
> If {{persist()}} returned a context manager, you could rewrite this as follows:
> {code}
> with labeled_data.persist():
>     model =
> {code}
> Upon exiting the {{with}} block, {{labeled_data}} would automatically be unpersisted.
> This can be done in a backwards-compatible way since {{persist()}} would still return
the parent DataFrame or RDD as it does today, but add two methods to the object: {{\_\_enter\_\_()}}
and {{\_\_exit\_\_()}}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message