spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: registering udf to use in spark.sql('select...
Date Thu, 04 Aug 2016 19:07:45 GMT
Yes pretty straight forward define, register and use

def cleanupCurrency (word : String) : Double = {
         word.toString.substring(1).replace(",", "").toDouble
}
sqlContext.udf.register("cleanupCurrency", cleanupCurrency(_:String))


val a = df.filter(col("Total") > "").map(p => Invoices(p(0).toString,
p(1).toString, cleanupCurrency(p(2).toString),
cleanupCurrency(p(3).toString), cleanupCurrency(p(4).toString)))

HTH


Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 4 August 2016 at 17:09, Nicholas Chammas <nicholas.chammas@gmail.com>
wrote:

> No, SQLContext is not disappearing. The top-level class is replaced by
> SparkSession, but you can always get the underlying context from the
> session.
>
> You can also use SparkSession.udf.register()
> <http://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html#pyspark.sql.SparkSession.udf>,
> which is just a wrapper for sqlContext.registerFunction
> <https://github.com/apache/spark/blob/2182e4322da6ba732f99ae75dce00f76f1cdc4d9/python/pyspark/sql/context.py#L511-L520>
> .
> ​
>
> On Thu, Aug 4, 2016 at 12:04 PM Ben Teeuwen <bteeuwen@gmail.com> wrote:
>
>> Yes, but I don’t want to use it in a select() call.
>> Either selectExpr() or spark.sql(), with the udf being called inside a
>> string.
>>
>> Now I got it to work using "sqlContext.registerFunction('
>> encodeOneHot_udf',encodeOneHot, VectorUDT())”
>> But this sqlContext approach will disappear, right? So I’m curious what
>> to use instead.
>>
>> On Aug 4, 2016, at 3:54 PM, Nicholas Chammas <nicholas.chammas@gmail.com>
>> wrote:
>>
>> Have you looked at pyspark.sql.functions.udf and the associated examples?
>> 2016년 8월 4일 (목) 오전 9:10, Ben Teeuwen <bteeuwen@gmail.com>님이
작성:
>>
>>> Hi,
>>>
>>> I’d like to use a UDF in pyspark 2.0. As in ..
>>> ________
>>>
>>> def squareIt(x):
>>>   return x * x
>>>
>>> # register the function and define return type
>>> ….
>>>
>>> spark.sql(“”"select myUdf(adgroupid, 'extra_string_parameter') as
>>> function_result from df’)
>>>
>>> _________
>>>
>>> How can I register the function? I only see registerFunction in the
>>> deprecated sqlContext at http://spark.apache.org/
>>> docs/2.0.0/api/python/pyspark.sql.html.
>>> As the ‘spark’ object unifies hiveContext and sqlContext, what is the
>>> new way to go?
>>>
>>> Ben
>>>
>>
>>

Mime
View raw message