spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From agg212 <>
Subject Re: Pickle Spark DataFrame
Date Wed, 28 Oct 2015 15:27:21 GMT
I would just like to be able to put a Spark DataFrame in a manager.dict() and
be able to get it out (manager.dict() calls pickle on the object being
stored).  Ideally, I would just like to store a pointer to the DataFrame
object so that it remains distributed within Spark (i.e., not materialize
and then store).  Here is an example:

data = sparkContext.jsonFile(data_file) #load file
cache = Manager.dict() #thread-safe container
cache['id'] = data #store reference to data, not materialized result
new_data = cache['id'] #get reference to distributed spark dataframe

View this message in context:
Sent from the Apache Spark Developers List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message