spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Uang <justin.u...@gmail.com>
Subject Using UDFs in Java without registration
Date Fri, 29 May 2015 19:54:24 GMT
I would like to define a UDF in Java via a closure and then use it without
registration. In Scala, I believe there are two ways to do this:

    myUdf = functions.udf({ _ + 5})
    myDf.select(myUdf(myDf("age")))

or

    myDf.select(functions.callUDF({_ + 5}, DataTypes.IntegerType,
myDf("age")))

However, both of these don't work for Java UDF. The first one requires
TypeTags. For the second one, I was able to hack it by creating a scala
AbstractFunction1 and using callUDF, which requires declaring the catalyst
DataType instead of using TypeTags. However, it was still nasty because I
had to return a scala map instead of a java map.

Is there first class support for creating
a org.apache.spark.sql.UserDefinedFunction that works with
the org.apache.spark.sql.api.java.UDF1<T1, R>? I'm fine with having to
declare the catalyst type when creating it.

If it doesn't exist, I would be happy to work on it =)

Justin

Mime
View raw message