spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "flykobe cheng (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-7892) Python class in __main__ may trigger AssertionError
Date Wed, 27 May 2015 09:35:18 GMT
flykobe cheng created SPARK-7892:
------------------------------------

             Summary: Python class in __main__ may trigger AssertionError
                 Key: SPARK-7892
                 URL: https://issues.apache.org/jira/browse/SPARK-7892
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 1.2.0
         Environment: Linux, Python 2.7.3
pickled by Python pickle Lib
            Reporter: flykobe cheng
            Priority: Minor


Callback functions for spark transformations and actions will be pickled. 
If the callback is instancemethod of __main__ module's class, and the class has more than
one instancemethod which using class properties or classmethods, the class will be pickled
twice, and 'pickle.memoize' twice, then trigger AssertionError.

Demo code:
class AClass(object):
    _class_var = {'classkey': 'classval', } 

    def main_object_method(self, item):
        logging.warn("class var by %s: %s" % (sys._getframe().f_code.co_name, AClass._class_var['classkey']))

    def main_object_method2(self, item):
        logging.warn("class var by %s: %s" % (sys._getframe().f_code.co_name, AClass._class_var['classkey']))

        
def test_main_object_method(sc):
    obj = AClass()
    res = sc.parallelize(range(4)).map(obj.main_object_method).collect()


if __name__ == '__main__':
    cf = pyspark.SparkConf()
    cf.set('spark.cores.max', 1)

    sc = pyspark.SparkContext(appName = "flykobe_demo_pickle_error", conf = cf)

    test_main_object_method(sc)


Traceback:
  File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py",
line 310, in save_function_tuple
    save(f_globals)
  File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 291, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py",
line 174, in save_dict
    pickle.Pickler.save_dict(self, obj)
  File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 654, in save_dict
    self._batch_setitems(obj.iteritems())
  File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 686, in _batch_setitems
    save(v)
  File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 291, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py",
line 468, in save_global
    d),obj=obj)
  File "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py",
line 638, in save_reduce
    self.memoize(obj)
  File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 248, in memoize
    assert id(obj) not in self.memo 
AssertionError


Problem in Python/Lib/pickle.py:
    def memoize(self, obj):
        """Store an object in the memo."""
        if self.fast:
            return
        assert id(obj) not in self.memo
        memo_len = len(self.memo)
        self.write(self.put(memo_len))
        self.memo[id(obj)] = memo_len, obj



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message