spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-19161) Improving UDF Docstrings
Date Tue, 10 Jan 2017 21:24:58 GMT

     [ https://issues.apache.org/jira/browse/SPARK-19161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apache Spark reassigned SPARK-19161:
------------------------------------

    Assignee:     (was: Apache Spark)

> Improving UDF Docstrings
> ------------------------
>
>                 Key: SPARK-19161
>                 URL: https://issues.apache.org/jira/browse/SPARK-19161
>             Project: Spark
>          Issue Type: Sub-task
>          Components: PySpark, SQL
>    Affects Versions: 1.6.0, 2.0.0, 2.1.0, 2.2.0
>            Reporter: Maciej Szymkiewicz
>
> Current state
> Right now `udf` returns an `UserDefinedFunction` object which doesn't provide meaningful
docstring:
> {code}
> In [1]: from pyspark.sql.types import IntegerType
> In [2]: from pyspark.sql.functions import udf
> In [3]: def _add_one(x):
>         """Adds one"""
>         if x is not None:
>                 return x + 1
>    ...:     
> In [4]: add_one = udf(_add_one, IntegerType())
> In [5]: ?add_one
> Type:        UserDefinedFunction
> String form: <pyspark.sql.functions.UserDefinedFunction object at 0x7f281ed2d198>
> File:        ~/Spark/spark-2.0/python/pyspark/sql/functions.py
> Signature:   add_one(*cols)
> Docstring:
> User defined function in Python
> .. versionadded:: 1.3
> In [6]: help(add_one)
> Help on UserDefinedFunction in module pyspark.sql.functions object:
> class UserDefinedFunction(builtins.object)
>  |  User defined function in Python
>  |  
>  |  .. versionadded:: 1.3
>  |  
>  |  Methods defined here:
>  |  
>  |  __call__(self, *cols)
>  |      Call self as a function.
>  |  
>  |  __del__(self)
>  |  
>  |  __init__(self, func, returnType, name=None)
>  |      Initialize self.  See help(type(self)) for accurate signature.
>  |  
>  |  ----------------------------------------------------------------------
>  |  Data descriptors defined here:
>  |  
>  |  __dict__
>  |      dictionary for instance variables (if defined)
>  |  
>  |  __weakref__
>  |      list of weak references to the object (if defined)
> (END)
> {code}
> It is possible to extract the function:
> {code}
> In [7]: ?add_one.func
> Signature: add_one.func(x)
> Docstring: Adds one
> File:      ~/Spark/spark-2.0/<ipython-input-3-d2d8e4c530ac>
> Type:      function
> In [8]: help(add_one.func)
> Help on function _add_one in module __main__:
> _add_one(x)
>     Adds one
> {code}
> but it assumes that the final user is aware of the distinction between UDF and built-in
functions.
> Proposed
> Copy input functions docstring to the UDF object or function wrapper. 
> {code}
> In [1]: from pyspark.sql.types import IntegerType
> In [2]: from pyspark.sql.functions import udf
> In [3]: def _add_one(x):
>         """Adds one"""
>         if x is not None:
>                 return x + 1
>    ...:    
> In [4]: add_one = udf(_add_one, IntegerType())
> In [5]: ?add_one
> Signature: add_one(x)
> Docstring:
> Adds one
> SQL Type: IntegerType
> File:      ~/Workspace/spark/<ipython-input-3-d2d8e4c530ac>
> Type:      function
> In [6]: help(add_one)
> Help on function _add_one in module __main__:
> _add_one(x)
>     Adds one
>     
>     SQL Type: IntegerType
> (END)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message