spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From agg212 <...@cs.brown.edu>
Subject Re: Pickle Spark DataFrame
Date Wed, 28 Oct 2015 15:27:21 GMT
I would just like to be able to put a Spark DataFrame in a manager.dict() and
be able to get it out (manager.dict() calls pickle on the object being
stored).  Ideally, I would just like to store a pointer to the DataFrame
object so that it remains distributed within Spark (i.e., not materialize
and then store).  Here is an example:

data = sparkContext.jsonFile(data_file) #load file
cache = Manager.dict() #thread-safe container
cache['id'] = data #store reference to data, not materialized result
new_data = cache['id'] #get reference to distributed spark dataframe
new_data.show()




--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Pickle-Spark-DataFrame-tp14803p14825.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message