spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Annamalai Venugopal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-22792) PySpark UDF registering issue
Date Fri, 15 Dec 2017 09:44:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-22792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16292266#comment-16292266
] 

Annamalai Venugopal commented on SPARK-22792:
---------------------------------------------

which is in the cloudpickle.py


> PySpark UDF registering issue
> -----------------------------
>
>                 Key: SPARK-22792
>                 URL: https://issues.apache.org/jira/browse/SPARK-22792
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.2.1
>         Environment: Windows OS, Python pycharm ,Spark
>            Reporter: Annamalai Venugopal
>              Labels: windows
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> I am doing a project with pyspark i am struck with an issue
> Traceback (most recent call last):
>   File "C:/Users/avenugopal/PycharmProjects/POC_for_vectors/main.py", line 187, in <module>
>     hypernym_extracted_data = result.withColumn("hypernym_extracted_data", hypernym_fn(F.column("token_extracted_data")))
>   File "C:\Users\avenugopal\AppData\Local\Programs\Python\Python36\lib\site-packages\pyspark\sql\functions.py",
line 1957, in wrapper
>     return udf_obj(*args)
>   File "C:\Users\avenugopal\AppData\Local\Programs\Python\Python36\lib\site-packages\pyspark\sql\functions.py",
line 1916, in __call__
>     judf = self._judf
>   File "C:\Users\avenugopal\AppData\Local\Programs\Python\Python36\lib\site-packages\pyspark\sql\functions.py",
line 1900, in _judf
>     self._judf_placeholder = self._create_judf()
>   File "C:\Users\avenugopal\AppData\Local\Programs\Python\Python36\lib\site-packages\pyspark\sql\functions.py",
line 1909, in _create_judf
>     wrapped_func = _wrap_function(sc, self.func, self.returnType)
>   File "C:\Users\avenugopal\AppData\Local\Programs\Python\Python36\lib\site-packages\pyspark\sql\functions.py",
line 1866, in _wrap_function
>     pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command)
>   File "C:\Users\avenugopal\AppData\Local\Programs\Python\Python36\lib\site-packages\pyspark\rdd.py",
line 2374, in _prepare_for_python_RDD
>     pickled_command = ser.dumps(command)
>   File "C:\Users\avenugopal\AppData\Local\Programs\Python\Python36\lib\site-packages\pyspark\serializers.py",
line 460, in dumps
>     return cloudpickle.dumps(obj, 2)
>   File "C:\Users\avenugopal\AppData\Local\Programs\Python\Python36\lib\site-packages\pyspark\cloudpickle.py",
line 704, in dumps
>     cp.dump(obj)
>   File "C:\Users\avenugopal\AppData\Local\Programs\Python\Python36\lib\site-packages\pyspark\cloudpickle.py",
line 148, in dump
>     return Pickler.dump(self, obj)
>   File "C:\Users\avenugopal\AppData\Local\Programs\Python\Python36\lib\pickle.py", line
409, in dump
>     self.save(obj)
>   File "C:\Users\avenugopal\AppData\Local\Programs\Python\Python36\lib\pickle.py", line
476, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "C:\Users\avenugopal\AppData\Local\Programs\Python\Python36\lib\pickle.py", line
736, in save_tuple
>     save(element)
>   File "C:\Users\avenugopal\AppData\Local\Programs\Python\Python36\lib\pickle.py", line
476, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "C:\Users\avenugopal\AppData\Local\Programs\Python\Python36\lib\site-packages\pyspark\cloudpickle.py",
line 249, in save_function
>     self.save_function_tuple(obj)
>   File "C:\Users\avenugopal\AppData\Local\Programs\Python\Python36\lib\site-packages\pyspark\cloudpickle.py",
line 297, in save_function_tuple
>     save(f_globals)
>   File "C:\Users\avenugopal\AppData\Local\Programs\Python\Python36\lib\pickle.py", line
476, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "C:\Users\avenugopal\AppData\Local\Programs\Python\Python36\lib\pickle.py", line
821, in save_dict
>     self._batch_setitems(obj.items())
>   File "C:\Users\avenugopal\AppData\Local\Programs\Python\Python36\lib\pickle.py", line
852, in _batch_setitems
>     save(v)
>   File "C:\Users\avenugopal\AppData\Local\Programs\Python\Python36\lib\pickle.py", line
476, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "C:\Users\avenugopal\AppData\Local\Programs\Python\Python36\lib\site-packages\pyspark\cloudpickle.py",
line 249, in save_function
>     self.save_function_tuple(obj)
>   File "C:\Users\avenugopal\AppData\Local\Programs\Python\Python36\lib\site-packages\pyspark\cloudpickle.py",
line 297, in save_function_tuple
>     save(f_globals)
>   File "C:\Users\avenugopal\AppData\Local\Programs\Python\Python36\lib\pickle.py", line
476, in save
>     f(self, obj) # Call unbound method with explicit self
>   File "C:\Users\avenugopal\AppData\Local\Programs\Python\Python36\lib\pickle.py", line
821, in save_dict
>     self._batch_setitems(obj.items())
>   File "C:\Users\avenugopal\AppData\Local\Programs\Python\Python36\lib\pickle.py", line
852, in _batch_setitems
>     save(v)
>   File "C:\Users\avenugopal\AppData\Local\Programs\Python\Python36\lib\pickle.py", line
521, in save
>     self.save_reduce(obj=obj, *rv)
>   File "C:\Users\avenugopal\AppData\Local\Programs\Python\Python36\lib\site-packages\pyspark\cloudpickle.py",
line 565, in save_reduce
>     "args[0] from __newobj__ args has the wrong class")
> _pickle.PicklingError: args[0] from __newobj__ args has the wrong class



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message