spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <ko...@tresata.com>
Subject Re: unsure how to create 2 outputs from spark-sql udf expression
Date Thu, 26 May 2016 16:46:57 GMT
that is nice and compact, but it does not add the columns to an existing
dataframe

On Wed, May 25, 2016 at 11:39 PM, Takeshi Yamamuro <linguin.m.s@gmail.com>
wrote:

> Hi,
>
> How about this?
> --
> val func = udf((i: Int) => Tuple2(i, i))
> val df = Seq((1, 0), (2, 5)).toDF("a", "b")
> df.select(func($"a").as("r")).select($"r._1", $"r._2")
>
> // maropu
>
>
> On Thu, May 26, 2016 at 5:11 AM, Koert Kuipers <koert@tresata.com> wrote:
>
>> hello all,
>>
>> i have a single udf that creates 2 outputs (so a tuple 2). i would like
>> to add these 2 columns to my dataframe.
>>
>> my current solution is along these lines:
>> df
>>   .withColumn("_temp_", udf(inputColumns))
>>   .withColumn("x", col("_temp_)("_1"))
>>   .withColumn("y", col("_temp_")("_2"))
>>   .drop("_temp_")
>>
>> this works, but its not pretty with the temporary field stuff.
>>
>> i also tried this:
>> val tmp = udf(inputColumns)
>> df
>>   .withColumn("x", tmp("_1"))
>>   .withColumn("y", tmp("_2"))
>>
>> this also works, but unfortunately the udf is evaluated twice
>>
>> is there a better way to do this?
>>
>> thanks! koert
>>
>
>
>
> --
> ---
> Takeshi Yamamuro
>

Mime
View raw message