spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From HyukjinKwon <...@git.apache.org>
Subject [GitHub] spark pull request #20163: [SPARK-22966][PYTHON][SQL] Python UDFs with retur...
Date Sat, 13 Jan 2018 04:39:47 GMT
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20163#discussion_r161363630
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvaluatePython.scala
---
    @@ -144,6 +145,7 @@ object EvaluatePython {
         }
     
         case StringType => (obj: Any) => nullSafeConvert(obj) {
    +      case _: Calendar => null
           case _ => UTF8String.fromString(obj.toString)
    --- End diff --
    
    BTW, seems there is another hole when we actually do the internal conversion with unexpected
types:
    
    ```python
    >>> from pyspark.sql.functions import udf
    >>> f = udf(lambda x: x, "date")
    >>> spark.range(1).select(f("id")).show()
    ```
    
    ```
    org.apache.spark.api.python.PythonException: Traceback (most recent call last):
      File "./python/pyspark/worker.py", line 229, in main
        process()
      File "./python/pyspark/worker.py", line 224, in process
        serializer.dump_stream(func(split_index, iterator), outfile)
      File "./python/pyspark/worker.py", line 149, in <lambda>
        func = lambda _, it: map(mapper, it)
      File "<string>", line 1, in <lambda>
      File "./python/pyspark/worker.py", line 72, in <lambda>
        return lambda *a: toInternal(f(*a))
      File "/.../pyspark/sql/types.py", line 175, in toInternal
        return d.toordinal() - self.EPOCH_ORDINAL
    AttributeError: 'int' object has no attribute 'toordinal'
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message