spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "flykobe cheng (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-7891) Python class in __main__ may trigger AssertionError
Date Wed, 27 May 2015 09:45:17 GMT

     [ https://issues.apache.org/jira/browse/SPARK-7891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

flykobe cheng updated SPARK-7891:
---------------------------------
    Attachment: demo_error.log
                demo_pickle_error.py

> Python class in __main__ may trigger AssertionError
> ---------------------------------------------------
>
>                 Key: SPARK-7891
>                 URL: https://issues.apache.org/jira/browse/SPARK-7891
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.2.0
>         Environment: Linux, Python 2.7.3
> pickled by Python pickle Lib
>            Reporter: flykobe cheng
>            Priority: Minor
>         Attachments: demo_error.log, demo_pickle_error.py
>
>
> Callback functions for spark transformations and actions will be pickled. 
> If the callback is instancemethod of __main__ module's class, and the class has more
than one instancemethod which using class properties or classmethods, the class will be pickled
twice, and 'pickle.memoize' twice, then trigger AssertionError.
> Demo code:
> class AClass(object):
>     _class_var = {'classkey': 'classval', } 
>     def main_object_method(self, item):
>         logging.warn("class var by %s: %s" % (sys._getframe().f_code.co_name, AClass._class_var['classkey']))
>     def main_object_method2(self, item):
>         logging.warn("class var by %s: %s" % (sys._getframe().f_code.co_name, AClass._class_var['classkey']))
>         
> def test_main_object_method(sc):
>     obj = AClass()
>     res = sc.parallelize(range(4)).map(obj.main_object_method).collect()
> if __name__ == '__main__':
>     cf = pyspark.SparkConf()
>     cf.set('spark.cores.max', 1)
>     sc = pyspark.SparkContext(appName = "flykobe_demo_pickle_error", conf = cf)
>     test_main_object_method(sc)
> Traceback:
>   File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py",
line 310, in save_function_tuple
>     save(f_globals)
>   File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 291, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py",
line 174, in save_dict
>     pickle.Pickler.save_dict(self, obj)
>   File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 654, in save_dict
>     self._batch_setitems(obj.iteritems())
>   File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 686, in _batch_setitems
>     save(v)
>   File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 291, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py",
line 468, in save_global
>     d),obj=obj)
>   File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py",
line 638, in save_reduce
>     self.memoize(obj)
>   File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 248, in memoize
>     assert id(obj) not in self.memo
> AssertionError
> Problem in Python/Lib/pickle.py:
>     def memoize(self, obj):
>         """Store an object in the memo."""
>         if self.fast:
>             return
>         assert id(obj) not in self.memo
>         memo_len = len(self.memo)
>         self.write(self.put(memo_len))
>         self.memo[id(obj)] = memo_len, obj



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message