spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Can't use UDFs with Dataframes in spark-2.0-preview scala-2.10
Date Tue, 07 Jun 2016 22:22:54 GMT
With commit 200f01c8fb15680b5630fbd122d44f9b1d096e02 using Scala 2.11:

Using Python version 2.7.9 (default, Apr 29 2016 10:48:06)
SparkSession available as 'spark'.
>>> from pyspark.sql import SparkSession
>>> from pyspark.sql.types import IntegerType, StructField, StructType
>>> from pyspark.sql.functions import udf
>>> from pyspark.sql.types import Row
>>> spark = SparkSession.builder.master('local[4]').appName('2.0
DF').getOrCreate()
>>> add_one = udf(lambda x: x + 1, IntegerType())
>>> schema = StructType([StructField('a', IntegerType(), False)])
>>> df = spark.createDataFrame([Row(a=1),Row(a=2)], schema)
>>> df.select(add_one(df.a).alias('incremented')).collect()
[Row(incremented=2), Row(incremented=3)]

Let me build with Scala 2.10 and try again.

On Tue, Jun 7, 2016 at 2:47 PM, Franklyn D'souza <
franklyn.dsouza@shopify.com> wrote:

> I've built spark-2.0-preview (8f5a04b) with scala-2.10 using the following
>>
>>
>> ./dev/change-version-to-2.10.sh
>> ./dev/make-distribution.sh -DskipTests -Dzookeeper.version=3.4.5
>> -Dcurator.version=2.4.0 -Dscala-2.10 -Phadoop-2.6  -Pyarn -Phive
>
>
> and then ran the following code in a pyspark shell
>
> from pyspark.sql import SparkSession
>> from pyspark.sql.types import IntegerType, StructField, StructType
>> from pyspark.sql.functions import udf
>> from pyspark.sql.types import Row
>> spark = SparkSession.builder.master('local[4]').appName('2.0
>> DF').getOrCreate()
>> add_one = udf(lambda x: x + 1, IntegerType())
>> schema = StructType([StructField('a', IntegerType(), False)])
>> df = spark.createDataFrame([Row(a=1),Row(a=2)], schema)
>> df.select(add_one(df.a).alias('incremented')).collect()
>
>
> This never returns with a result.
>
>
>

Mime
View raw message